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Foreword 



rd , 



This Technical Specification has been produced by the 3 Generation Partnership Project (3GPP). 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an 
identifying change of release date and an increase in version number as follows: 

Version x.y.z 

where: 

X the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 or greater indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the document. 
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Scope 



The present document describes the detailed mapping from input blocks of 160 speech samples in 13-bit uniform PCM 
format to encoded blocks of 95, 103, 118, 134, 148, 159, 204, and 244 bits and from encoded blocks of 95, 103, 118, 
134, 148, 159, 204, and 244 bits to output blocks of 160 reconstructed speech samples. The sampling rate is 
8 000 samples/s leading to a bit rate for the encoded bit stream of 4.75, 5.15, 5.90, 6.70, 7.40, 7.95, 10.2 or 12.2 kbit/s. 
The coding scheme for the multi-rate coding modes is the so-called Algebraic Code Excited Linear Prediction Coder, 
hereafter referred to as ACELP. The multi-rate ACELP coder is referred to as MR-ACELP. 

In the case of discrepancy between the requirements described in the present document and the fixed point 
computational description (ANSI-C code) of these requirements contained in [4], the description in [4] will prevail. The 
ANSI-C code is not described in the present document, see [4] for a description of the ANSI-C code. 

The transcoding procedure specified in the present document is mandatory for systems using the AMR speech codec. 
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[7] ITU-T Recommendation G.726: "40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code 

Modulation (ADPCM)". 
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3 Definitions, symbols and abbreviations 

3.1 Definitions 

For the purposes of the present document, the following terms and definitions apply: 
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adaptive codebook: contains excitation vectors that are adapted for every subframe. The adaptive codebook is derived 
from the long-term filter state. The lag value can be viewed as an index into the adaptive codebook 

adaptive postfilter: this filter is applied to the output of the short-term synthesis filter to enhance the perceptual quality 
of the reconstructed speech. In the adaptive multi-rate codec, the adaptive postfilter is a cascade of two filters: a formant 
postfilter and a tilt compensation filter 

algebraic codebook: fixed codebook where algebraic code is used to populate the excitation vectors (innovation 
vectors). The excitation contains a small number of nonzero pulses with predefined interlaced sets of positions 

anti-sparseness processing: adaptive post-processing procedure applied to the fixed codebook vector in order to 
reduce perceptual artefacts from a sparse fixed codebook vector 

closed-loop pitch analysis: adaptive codebook search, i.e., a process of estimating the pitch (lag) value from the 
weighted input speech and the long term filter state. In the closed-loop search, the lag is searched using error 
minimization loop (analysis-by-synthesis). In the adaptive multi-rate codec, closed-loop pitch search is performed for 
every subframe 

direct form coefficients: One of the formats for storing the short term filter parameters. In the adaptive multi-rate 
codec, all filters which are used to modify speech samples use direct form coefficients. 

fixed codebook: The fixed codebook contains excitation vectors for speech synthesis filters. The contents of the 
codebook are non-adaptive (i.e., fixed). In the adaptive multi-rate codec, the fixed codebook is implemented using an 
algebraic codebook. 

fractional lags: A set of lag values having sub-sample resolution. In the adaptive multi-rate codec a sub-sample 
resolution of 1/6* or 1/3"* of a sample is used. 

frame: time interval equal to 20 ms (160 samples at an 8 kHz sampling rate) 

integer lags: set of lag values having whole sample resolution 

interpolating filter: FIR filter used to produce an estimate of subsample resolution samples, given an input sampled 
with integer sample resolution 

inverse filter: this filter removes the short term correlation from the speech signal. The filter models an inverse 
frequency response of the vocal tract 

lag: long term filter delay. This is typically the true pitch period, or its multiple or sub-multiple 

Line Spectral Frequencies: (see Line Spectral Pair) 

Line Spectral Pair: transformation of LPC parameters. Line Spectral Pairs are obtained by decomposing the inverse 
filter transfer function A(z) to a set of two transfer functions, one having even symmetry and the other having odd 
symmetry. The Line Spectral Pairs (also called as Line Spectral Frequencies) are the roots of these polynomials on the 
z-unit circle 

LP analysis window: for each frame, the short term filter coefficients are computed using the high pass filtered speech 
samples within the analysis window. In the adaptive multi-rate codec, the length of the analysis window is always 240 
samples. For each frame, two asymmetric windows are used to generate two sets of LP coefficient in the 12.2 kbit/s 
mode. For the other modes, only a single asymmetric window is used to generate a single set of LP coefficients. In the 
12.2 kbit/s mode, no samples of the future frames are used (no lookahead). The other modes use a 5 ms lookahead 

LP coefficients: linear Prediction (LP) coefficients (also referred as Linear Predictive Coding (LPC) coefficients) is a 
generic descriptive term for the short term filter coefficients 

mode: when used alone, refers to the source codec mode, i.e., to one of the source codecs employed in the AMR codec 

open-loop pitch search: process of estimating the near optimal lag directly from the weighted speech input. This is 
done to simplify the pitch analysis and confine the closed-loop pitch search to a small number of lags around the 
open-loop estimated lags. In the adaptive multi-rate codec, an open-loop pitch search is performed in every other 
subframe 

residual: the output signal resulting from an inverse filtering operation 
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short term synthesis filter: this filter introduces, into the excitation signal, short term correlation which models the 
impulse response of the vocal tract 

perceptual weighting filter: this filter is employed in the analysis-by-synthesis search of the codebooks. The filter 
exploits the noise masking properties of the formants (vocal tract resonances) by weighting the error less in regions near 
the formant frequencies and more in regions away from them 

subframe: time interval equal to 5 ms (40 samples at 8 kHz sampling rate) 

vector quantization: method of grouping several parameters into a vector and quantizing them simultaneously 

zero input response: output of a filter due to past inputs, i.e. due to the present state of the filter, given that an input of 
zeros is applied 

zero state response: output of a filter due to the present input, given that no past inputs have been applied, i.e., given 
that the state information in the filter is all zeroes 

3.2 Symbols 

For the purposes of the present document, the following symbols apply: 

A(z) The inverse filter with unquantized coefficients 

Ayz) The inverse filter with quantized coefficients 

H[z) = -^ The speech synthesis filter with quantized coefficients 

A{z) 

a^ The unquantized linear prediction parameters (direct form coefficients) 

Ui The quantified linear prediction parameters 

^ The order of the LP model 
1 



The long-term synthesis filter 

B(z) 

W(z) The perceptual weighting filter (unquantized coefficients) 

}^j , }^2 The perceptual weighting factors 

F^(z) Adaptive pre-filter 

T The integer pitch lag nearest to the closed-loop fractional pitch lag of the subframe 

P The adaptive pre-filter coefficient (the quantified pitch gain) 

H f(z) = — The formant postfilter 

Y ^ Control coefficient for the amount of the formant post-filtering 

y^ Control coefficient for the amount of the formant post-filtering 

Hj\z) Tilt compensation filter 

y I Control coefficient for the amount of the tilt compensation filtering 

jU= Yfk]' A tilt factor, with k]' being the first reflection coefficient 

hAn) The truncated impulse response of the formant postfilter 

L/^ The length of hAn) 

rfj(i) The auto-correlations of h^ [n) 

Aiz/Yfi } The inverse filter (numerator) part of the formant postfilter 

1/ Ayz/y^ ] The synthesis filter (denominator) part of the formant postfilter 

r{n) The residual signal of the inverse filter Ayz//^ ) 
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hf\n) Impulse response of the tilt compensation filter 

/3g^ (n) The AGC -controlled gain scaling factor of the adaptive postfilter 

OC The AGC factor of the adaptive postfilter 

Hfjliz) Pre-processing high-pass filter 

Wj(n), Wjj(n) LP analysis windows 

1 Length of the first part of the LP analysis window / *- -' 
L2 Length of the second part of the LP analysis window li'y(n) 

Lj ' Length of the first part of the LP analysis window Wjj{n) 

L2 Length of the second part of the LP analysis window Wjj (n) 

^ac(^^ The auto-correlations of the windowed speech s' (n) 

Wi^„ \i) Lag window for the auto -correlations (60 Hz bandwidth expansion) 

/q The bandwidth expansion in Hz 

f^ The sampling frequency in Hz 

r' (k) 

'"^ The modified (bandwidth expanded) auto-correlations 

^LDV) T^^ prediction error in the /th iteration of the Le Vinson algorithm 

k^ The /th reflection coefficient 

a ■ The 7th direct form coefficient in the Jth iteration of the Le Vinson algorithm 

F(\z) Symmetric LSF polynomial 

2 v^' Antisymmetric LSF polynomial 

F^yz) Polynomial -F/(z) with root z = — 1 eliminated 

F2\z) Polynomial F2\z) with root Z = 1 eliminated 

"' The line spectral pairs (LSPs) in the cosine domain 

q An LSP vector in the cosine domain 

q^ The quantified LSP vector at the ith subframe of the frame n 

i The line spectral frequencies (LSFs) 

r^j {x) A mth order Chebyshev polynomial 

/j(/),/2(z) The coefficients of the polynomials _F]^(z) and /^(z) 

/l (05/2(0 The coefficients of the polynomials F(\z) and F2\Z) 

f il) The coefficients of either F-^\z) or -^2'^' 

C( x) Sum polynomial of the Chebyshev polynomials 

X Cosine of angular frequency CO 

Aj^ Recursion coefficients for the Chebyshev polynomial evaluation 

fl The line spectral frequencies (LSFs) in Hz 

^ ~ I /l /2 ■ ■ • /l I "^^^ vector representation of the LSFs in Hz 

Z [nj , Z [n) The mean-removed LSF vectors at frame n 

r [n) , r [n) The LSF prediction residual vectors at frame n 

p(n) The predicted LSF vector at frame n 

f [n — l) The quantified second residual vector at the past frame 

f The quantified LSF vector at quantization index k 

^LSP ^^^ ^^^ quantization error 

W; , / = 1, . . . ,10, LSP-quantization weighting factors 
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dj The distance between the line spectral frequencies /j_,_j and f^_^ 

h\n) The impulse response of the weighted synthesis filter 

Oj^ The correlation maximum of open-loop pitch analysis at delay k 

Of , i—l, ... ,3 The correlation maxima at delays t^,i = 1,... ,3 

iMj , tj 1, / = 1, . . . ,3 The normalized correlation maxima Mj and the corresponding delays ?,- , / = 1, . . . ,3 

A(z/ 7i ) 
H(z)W(z) — — The weighted synthesis filter 

A(z)A(z/r2) 

AyziYi ) "'"he numerator of the perceptual weighting filter 

1/ A\zIy2 ) "^^^ denominator of the perceptual weighting filter 

Ty The integer nearest to the fractional pitch lag of the previous (1^' or 3"*) subframe 

s' (n) The windowed speech signal 

S-^^,[n) The weighted speech signal 

s{n) Reconstructed speech signal 

s'[n) The gain-scaled post-filtered signal 

S f [n) Post-filtered speech signal (before scaling) 

x{n) The target signal for adaptive codebook search 

X2\n) X2 The target signal for algebraic codebook search 

reSj^pin) The LP residual signal 

C\n) The fixed codebook vector 

V\n) The adaptive codebook vector 

y{n) = v(n)* h(n) The filtered adaptive codebook vector 

yi^\n) The past filtered excitation 

^ ' The excitation signal 

u{n) The emphasized adaptive codebook vector 

u' (n) The gain-scaled emphasized excitation signal 

Tgp The best open-loop lag 

tffiif^ Minimum lag search value 

?^Q^ Maximum lag search value 

R{k) Correlation term to be maximized in the adaptive codebook search 

b24 The FIR filter for interpolating the normalized correlation term R{k) 

R{k)f The interpolated value of R{k) for the integer delay k and fraction t 

b(^Q The FIR filter for interpolating the past excitation signal U\n) to yield the adaptive codebook 

vector v\n) 
A^ Correlation term to be maximized in the algebraic codebook search at index k 

Cj^ The correlation in the numerator of Aj^ at index k 

Ejj, The energy in the denominator of Aj^ at index k 

d = H X^ The correlation between the target signal X2\n) and the impulse response h(n) , i.e., backward 

filtered target 
H The lower triangular Toepliz convolution matrix with diagonal MO) and lower diagonals 

h{l),...,h{39) 
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= H H The matrix of correlations of h(n) 
d(n) The elements of the vector d 

<p(i,j) The elements of the symmetric matrix O 

C j^ The innovation vector 

C The correlation in the numerator of A^ 

nij The position of the ith pulse 

l3 J- The amplitude of the Zth pulse 

A^p The number of pulses in the fixed codebook excitation 

Ejy The energy in the denominator of A^ 

res jj-p{n) The normalized long-term prediction residual 

b\n) The signal used for presetting the signs in algebraic codebook search 

Si,\n) The sign signal for the algebraic codebook search 

d '\n) Sign extended backward filtered target 

(j) (i,j) The modified elements of the matrix O , including sign information 

Z , z{n) The fixed codebook vector convolved with h{n) 

E{n) The mean-removed innovation energy (in dB) 

E The mean of the innovation energy 

E[n) The predicted energy 

1 Ztj Z?2 b^ ^4 I The MA prediction coefficients 

R[k) The quantified prediction error at sub frame k 

Ej The mean innovation energy 

R(n) The prediction error of the fixed-codebook gain quantization 

Eq The quantization error of the fixed-codebook gain quantization 

e(n) The states of the synthesis filter 1/ A(z) 

e^\n) The perceptually weighted error of the analysis-by-synthesis search 

T] The gain scaling factor for the emphasized excitation 

g^ The fixed-codebook gain 

g^ The predicted fixed-codebook gain 

g^ The quantified fixed codebook gain 

g The adaptive codebook gain 

g The quantified adaptive codebook gain 

y „^. = g^ I g'^ A correction factor between the gain g^ and the estimated one g[. 

y„^. The optimum value for y „^ 

y ^,g Gain scaling factor 

3.3 Abbreviations 

For the purposes of the present document, the following abbreviations apply. 

ACELP Algebraic Code Excited Linear Prediction 

AGC Adaptive Gain Control 

AMR Adaptive Multi-Rate 

CELP Code Excited Linear Prediction 

EFR Enhanced Full Rate 

FIR Finite Impulse Response 



£75/ 



3GPP TS 26.090 version 10.0.0 Release 10 



12 



ETSI TS 126 090 VI 0.0.0 (2011-04) 



ISPP 

LP 

LPC 

LSF 

LSP 

LTP 

MA 



Interleaved Single-Pulse Permutation 

Linear Prediction 

Linear Predictive Coding 

Line Spectral Frequency 

Line Spectral Pair 

Long Term Predictor (or Long Term Prediction) 

Moving Average 



Outline description 



The present document is structured as follows: 

Clause 4.1 contains a functional description of the audio parts including the A/D and D/A functions. Clause 4.2 
describes the conversion between 13-bit uniform and 8-bit A-law or jl -law samples. Clauses 4.3 and 4.4 present a 

simplified description of the principles of the AMR codec encoding and decoding process respectively. In clause 4.5, 
the sequence and subjective importance of encoded parameters are given. 

Clause 5 presents the functional description of the AMR codec encoding, whereas clause 6 describes the decoding 
procedures. In clause 7, the detailed bit allocation of the AMR codec is tabulated. 

4.1 Functional description of audio parts 

The analogue -to-digital and digital-to-analogue conversion will in principle comprise the following elements: 

1) Analogue to uniform digital PCM 

microphone; 

input level adjustment device; 
input anti-aliasing filter; 
sample-hold device sampling at 8 kHz; 

analogue-to-uniform digital conversion to 13-bit representation. 
The uniform format shall be represented in two's complement. 

2) Uniform digital PCM to analogue 

conversion from 13-bit/8 kHz uniform PCM to analogue; 
a hold device; 

reconstruction filter including x/sin( x ) correction; 
output level adjustment device; 
earphone or loudspeaker. 
In the terminal equipment, the A/D function may be achieved either: 
by direct conversion to 13-bit uniform PCM format; 

or by conversion to 8-bit A-law or jl -law compounded format, based on a standard A-law or jl -law 

codec/filter according to ITU-T Recommendations G.71 1 [6] and G.714, followed by the 8-bit to 13-bit 
conversion as specified in clause 4.2. 1 . 

For the D/A operation, the inverse operations take place. 

In the latter case it should be noted that the specifications in ITU-T G.714 (superseded by G.712) are concerned with 
PCM equipment located in the central parts of the network. When used in the terminal equipment, the present document 
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does not on its own ensure sufficient out-of-band attenuation. The specification of out-of-band signals is defined in [1] 
in clause 2. 



4.2 Preparation of speech samples 



The encoder is fed with data comprising of samples with a resolution of 13 bits left justified in a 16-bit word. The three 
least significant bits are set to '0'. The decoder outputs data in the same format. Outside the speech codec further 
processing must be applied if the traffic data occurs in a different representation. 

4.2.1 PCM format conversion 

The conversion between 8-bit A-Law or /il -law compressed data and linear data with 13-bit resolution at the speech 
encoder input shall be as defined in ITU-T Rec. G.711 [6]. 

ITU-T Rec. G.71 1 [6] specifies the A-Law or /il -law to linear conversion and vice versa by providing table entries. 
Examples on how to perform the conversion by fixed-point arithmetic can be found in ITU-T Rec. G.726 [7]. Clause 
4.2.1 of G.726 [7] describes A-Law or /il -law to linear expansion and clause 4.2.8 of G.726 [7] provides a solution for 

linear to A-Law or /I -law compression. 

4.3 Principles of the adaptive multi-rate speech encoder 

The AMR codec consists of eight source codecs with bit-rates of 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kbit/s. 

The codec is based on the code-excited linear predictive (CELP) coding model. A 10* order linear prediction (LP), or 
short-term, synthesis filter is used which is given by: 

where aj,i — l,...,ni, are the (quantified) linear prediction (LP) parameters, and m = 10 is the predictor order. The 
long-term, or pitch, synthesis filter is given by: 

1 1 

"^rT = 7^' (2) 

Biz) l-gpz-"^ 

where T is the pitch delay and ^ „ is the pitch gain. The pitch synthesis filter is implemented using the so-called 
adaptive codebook approach. 

The CELP speech synthesis model is shown in figure 2. In this model, the excitation signal at the input of the short-term 
LP synthesis filter is constructed by adding two excitation vectors from adaptive and fixed (innovative) codebooks. The 
speech is synthesized by feeding the two properly chosen vectors from these codebooks through the short-term 
synthesis filter. The optimum excitation sequence in a codebook is chosen using an analysis-by-synthesis search 
procedure in which the error between the original and synthesized speech is minimized according to a perceptually 
weighted distortion measure. 

The perceptual weighting filter used in the analysis-by-synthesis search technique is given by: 

W{z)= ) ( , (3) 

where A(z) is the unquantized LP filter and 0<'/2^Tl— ^^^ '^he perceptual weighting factors. The values 

7j = 0.9 (for the 12.2 and 10.2 kbit/s mode) or /^ = 0.94 (for all other modes) and Yj = 0-6 are used. The 
weighting filter uses the unquantized LP parameters. 
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The coder operates on speech frames of 20 ms corresponding to 160 samples at the sampling frequency of 8 000 
sample/s. At each 160 speech samples, the speech signal is analysed to extract the parameters of the CELP model (LP 
filter coefficients, adaptive and fixed codebooks' indices and gains). These parameters are encoded and transmitted. At 
the decoder, these parameters are decoded and speech is synthesized by filtering the reconstructed excitation signal 
through the LP synthesis filter. 

The signal flow at the encoder is shown in figure 3. LP analysis is performed twice per frame for the 12.2 kbit/s mode 
and once for the other modes. For the 12.2 kbit/s mode, the two sets of LP parameters are converted to line spectrum 
pairs (LSP) and jointly quantized using split matrix quantization (SMQ) with 38 bits. For the other modes, the single set 
of LP parameters is converted to line spectrum pairs (LSP) and vector quantized using split vector quantization (S VQ). 
The speech frame is divided into 4 subframes of 5 ms each (40 samples). The adaptive and fixed codebook parameters 
are transmitted every subframe. The quantized and unquantized LP parameters or their interpolated versions are used 
depending on the subframe. An open-loop pitch lag is estimated in every other subframe (except for the 5.15 and 4.75 
kbit/s modes for which it is done once per frame) based on the perceptually weighted speech signal. 

Then the following operations are repeated for each subframe: 

The target signal x[n) is computed by filtering the LP residual through the weighted synthesis filter 

W{z)H[z) with the initial states of the filters having been updated by filtering the error between LP residual 
and excitation (this is equivalent to the common approach of subtracting the zero input response of the weighted 
synthesis filter from the weighted speech signal). 

The impulse response, h[n) of the weighted synthesis filter is computed. 

Closed-loop pitch analysis is then performed (to find the pitch lag and gain), using the target x[n) and impulse 

response h[n) , by searching around the open-loop pitch lag. Fractional pitch with 1/6* or 1/3"* of a sample 
resolution (depending on the mode) is used. 

The target signal x[n) is updated by removing the adaptive codebook contribution (filtered adaptive 

code vector), and this new target, Xjin) , is used in the fixed algebraic codebook search (to find the optimum 

innovation). 

The gains of the adaptive and fixed codebook are scalar quantified with 4 and 5 bits respectively or vector 
quantified with 6-7 bits (with moving average (MA) prediction applied to the fixed codebook gain). 

Finally, the filter memories are updated (using the determined excitation signal) for finding the target signal in 
the next subframe. 

The bit allocation of the AMR codec modes is shown in table 1. In each 20 ms speech frame, 95, 103, 118, 134, 148, 
159, 204 or 244 bits are produced, corresponding to a bit-rate of 4.75, 5.15, 5.90, 6.70, 7.40, 7.95, 10.2 or 12.2 kbit/s. 
More detailed bit allocation among the codec parameters is given in tables 9a-9h. Note that the most significant bits 
(MSB) are always sent first. 
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Table 1 : Bit allocation of the AMR coding algorithm for 20 ms frame 



Mode 


Parameter 


1*' 
subframe 


2^° 
subframe 


3^" 
subframe 


4'" 
subframe 


total per frame 


12.2kbit/s 
(GSM EFR) 


2 LSP sets 










38 


Pitch delay 


9 


6 


9 


6 


30 


Pitch gain 


4 


4 


4 


4 


16 


Algebraic code 


35 


35 


35 


35 


140 


Codebook gain 


5 


5 


5 


5 


20 


Total 


244 


10.2kbit/s 


LSP set 










26 


Pitch delay 


8 


5 


8 


5 


26 


Algebraic code 


31 


31 


31 


31 


124 


Gains 


7 


7 


7 


7 


28 


Total 










204 


7.95 kbit/s 


LSP sets 










27 


Pitch delay 


8 


6 


8 


6 


28 


Pitch gain 


4 


4 


4 


4 


16 


Algebraic code 


17 


17 


17 


17 


68 


Codebook gain 


5 


5 


5 


5 


20 


Total 


159 


7.40 kbit/s 
(TDMA EFR) 


LSP set 










26 


Pitch delay 


8 


5 


8 


5 


26 


Algebraic code 


17 


17 


17 


17 


68 


Gains 


7 


7 


7 


7 


28 


Total 


148 


6.70 kbit/s 
(PDC EFR) 


LSP set 










26 


Pitch delay 


8 


4 


8 


4 


24 


Algebraic code 


14 


14 


14 


14 


56 


Gains 


7 


7 


7 


7 


28 


Total 


134 


5.90 kbit/s 


LSP set 










26 


Pitch delay 


8 


4 


8 


4 


24 


Algebraic code 


11 


11 


11 


11 


44 


Gains 


6 


6 


6 


6 


24 


Total 


118 


5.15 kbit/s 


LSP set 










23 


Pitch delay 


8 


4 


4 


4 


20 


Algebraic code 


9 


9 


9 


9 


36 


Gains 


6 


6 


6 


6 


24 


Total 




103 


4.75 kbit/s 


LSP set 










23 


Pitch delay 


8 


4 


4 


4 


20 


Algebraic code 


9 


9 


9 


9 


36 


Gains 


8 


8 


16 


Total 




95 



4.4 Principles of the adaptive multi-rate speech decoder 

The signal flow at the decoder is shown in figure 4. At the decoder, based on the chosen mode, the transmitted indices 
are extracted from the received bitstream. The indices are decoded to obtain the coder parameters at each transmission 
frame. These parameters are the LSP vectors, the fractional pitch lags, the innovative codevectors, and the pitch and 
innovative gains. The LSP vectors are converted to the LP filter coefficients and interpolated to obtain LP filters at each 
subframe. Then, at each 40-sample subframe: 

the excitation is constructed by adding the adaptive and innovative codevectors scaled by their respective gains; 

the speech is reconstructed by filtering the excitation through the LP synthesis filter. 

Finally, the reconstructed speech signal is passed through an adaptive postfilter. 



£75/ 



3GPP TS 26.090 version 1 0.0.0 Release 10 16 ETSI TS 1 26 090 VI 0.0.0 (201 1 -04) 

4.5 Sequence and subjective importance of encoded 
parameters 

The encoder will produce the output information in a unique sequence and format, and the decoder must receive the 
same information in the same way. In table 9a-9h, the sequence of output bits and the bit allocation for each parameter 
is shown. 

The different parameters of the encoded speech and their individual bits have unequal importance with respect to 
subjective quality. The output and input frame formats for the AMR speech codec are given in [2], where a reordering 
of bits take place. 



5 Functional description of the encoder 

In this clause, the different functions of the encoder represented in figure 3 are described. 

5.1 Pre-processing (all modes) 

Two pre-processing functions are applied prior to the encoding process: high-pass filtering and signal down-scaling. 

Down-scaling consists of dividing the input by a factor of 2 to reduce the possibility of overflows in the fixed-point 
implementation. 

The high-pass filter serves as a precaution against undesired low frequency components. A filter with a cut off 
frequency of 80 Hz is used, and it is given by: 

„ ,, 0.927246093 -1.8544941z-'+0.927246903z"' 

Hui(z) = ; z . (4) 

' 1-1.906005859Z ' +0.911376953z'' 

Down-scaling and high-pass filtering are combined by dividing the coefficients at the numerator of Hf^^xz) by 2. 

5.2 Linear prediction analysis and quantization 

12.2 kbit/s mode 

Short-term prediction, or linear prediction (LP), analysis is performed twice per speech frame using the auto-correlation 
approach with 30 ms asymmetric windows. No lookahead is used in the auto-correlation computation. 

The auto-correlations of windowed speech are converted to the LP coefficients using the Levinson-Durbin algorithm. 
Then the LP coefficients are transformed to the Line Spectral Pair (LSP) domain for quantization and interpolation 
purposes. The interpolated quantified and unquantized filter coefficients are converted back to the LP filter coefficients 
(to construct the synthesis and weighting filters at each subframe). 

10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes 

Short-term prediction, or linear prediction (LP), analysis is performed once per speech frame using the auto-correlation 
approach with 30 ms asymmetric windows. A lookahead of 40 samples (5 ms) is used in the auto-correlation 
computation. 

The auto-correlations of windowed speech are converted to the LP coefficients using the Levinson-Durbin algorithm. 
Then the LP coefficients are transformed to the Line Spectral Pair (LSP) domain for quantization and interpolation 
purposes. The interpolated quantified and unquantized filter coefficients are converted back to the LP filter coefficients 
(to construct the synthesis and weighting filters at each subframe). 
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5.2.1 Windowing and auto-correlation computation 

12.2 kbit/s mode 

LP analysis is performed twice per frame using two different asymmetric windows. The first window has its weight 
concentrated at the second subframe and it consists of two halves of Hamming windows with different sizes. The 
window is given by: 



Wj(n) = 



0.54 -0.46 COS 



7m 



M'^-l. 



n = 0,...,L/^)-l, 



0.54 + 0.46 cos 



n{n-L^ 



(5) 



, n = Li(^),...,Li(^) + L2(^)-l. 



The values Lj^^ ' = 160 and L2 = 80 are used. The second window has its weight concentrated at the fourth 
subframe and it consists of two parts: the first part is half a Hamming window and the second part is a quarter of a 
cosine function cycle. The window is given by: 



Wjj(n) = 



f 



COS 



0.54 -0.46 COS 



27m 



IL,^''^ - \) 






(6) 



where the values h-^ = 232 and Lr^ ' = 8 are used. 

Note that both LP analyses are performed on the same set of speech samples. The windows are applied to 80 samples 
from past speech frame in addition to the 160 samples of the present speech frame. No samples from future frames are 
used (no lookahead). A diagram of the two LP analysis windows is depicted below. 




<- 



frame n-1 

20 ms 



frame n 



frame (160 samples) 



5 ms 
< > 

sub frame 



(40 samples) 
Figure 1 : LP analysis windows 

The auto-correlations of the windowed speech s'{n), n = 0, . . .239 , are computed by: 



239 



r,Ak) = Y.''in)s\n-k), fc = 0,...,10, 



(7) 



n-k 



and a 60 Hz bandwidth expansion is used by lag windowing the auto-correlations using the window: 
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WlaAi) = exp 



■ \ 






/ = !,... 10, 



(8) 



where /q = 60 Hz is the bandwidth expansion and f^ = 8000 Hz is the sampHng frequency. Further, r^^ (0) is 
multiplied by the white noise correction factor 1.0001 which is equivalent to adding a noise floor at -40 dB. 

10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes 

LP analysis is performed once per frame using an asymmetric window. The window has its weight concentrated at the 
fourth subframe and it consists of two parts: the first part is half a Hamming window and the second part is a quarter of 

a cosine function cycle. The window is given by equation (6) where the values Lj = 200 and Lj = 40 are used. 

The auto-correlations of the windowed speech s'[n),n = 0,.. .239 , are computed by equation (7) and a 60 Hz 
bandwidth expansion is used by lag windowing the auto-correlations using the window of equation (8). Further, ^^^(0) 
is multiplied by the white noise correction factor 1.0001 which is equivalent to adding a noise floor at -40 dB. 

5.2.2 Levinson-Durbin algorithm (all modes) 

The modified auto-correlations r' „^ (0) = 1.0001 r^^(0) and r' „^ (k) = r^c(^)w,^g(k), k = l,.. .10, are used to 
obtain the direct form LP filter coefficients a^ , A: = 1, ... ,10, by solving the set of equations. 



f.^,r\^{\i-k\)^-r\Ji), / = 1,...,10. 



(9) 



k=l 



The set of equations in (9) is solved using the Levinson-Durbin algorithm. This algorithm uses the following recursion: 

Eld(0) = rac (0) 
for / = 1 to 10 do 



/£,.(/-l) 



"o 


= 1 


*,=- 


\^;lA'-\jii-ji\ 


4' = k> 


for J = 1 to i-l do 


af=af-'^+k,ai^^ 
end 


end 


-^noj • 


-1 in 



The final solution is given as a , = U: , 7 = 1, . . .,10 . 

The LP filter coefficients are converted to the line spectral pair (LSP) representation for quantization and interpolation 
purposes. The conversions to the LSP domain and back to the LP filter coefficient domain are described in the next 
clause. 

5.2.3 LP to LSP conversion (all modes) 

The LP filter coefficients U/^yk — I,. . .,10 , are converted to the line spectral pair (LSP) representation for quantization 
and interpolation purposes. For a 10* order LP filter, the LSPs are defined as the roots of the sum and difference 
polynomials: 



FI{,) = a{z) + z-''a(z-') 



(10) 
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and 

F{{z) = A{z)-z-''a(z-'), (11) 

respectively. The polynomial F{{z) and F2\Z) are symmetric and anti-symmetric, respectively. It can be proven that 
all roots of these polynomials are on the unit circle and they alternate each other. F^\z) has a root z = — 1 {CO = TT) 
and F2\Z) has a root Z = l (<2^ = 0). To eliminate these two roots, we define the new polynomials: 



F,{z) = F{{z)/(l + z-') (12) 



and 



F2{z) = F{{z)/(l-z-') (13) 

Each polynomial has 5 conjugate roots on the unit circle ( e""'"' J , therefore, the polynomials can be written as 



1=1,3,... ,9 
and 



F2iz)= U(^-2q,z-'+z-^), (15) 



1=2,4,... ,10 

where q^ = COS f CO, j with co, being the line spectral frequencies (LSF) and they satisfy the ordering property 
< COj < CO2 <. . .< COjo < 71 . We refer to q^ as the LSPs in the cosine domain. 

Since both polynomials F^\z) and F2[z) are symmetric only the first 5 coefficients of each polynomial need to be 
computed. The coefficients of these polynomials are found by the recursive relations (for Z = to 4): 

/2(/ + l) = a-+i-a„_;+/2(/) 

where m—10 is the predictor order. 

The LSPs are found by evaluating the polynomials F^\z) and i^2v^) ^^ ^^ points equally spaced between and tt 
and checking for sign changes. A sign change signifies the existence of a root and the sign change interval is then 
divided 4 times to better track the root. The Chebyshev polynomials are used to evaluate F^iz) and Fjiz) ■ In this 

method the roots are found directly in the cosine domain w,| . The polynomials i^i(z) or F2{z) evaluated at 
Z = £ can be written as: 

F{w)^2e-J'''C{x), 

with: 

C{x) = T^ix) + f{l)T^{x) + f{2)T^{x) + f{3)T2{x) + f{4)Ti{x) + /(5)/2, (17) 

where T^yx) = COs(mft>) is the mth order Chebyshev polynomial, and /(/), Z = 1,. . . ,5 are the coefficients of 
either -Fj(z) or F2\z) , computed using the equations in (16). The polynomial C[x) is evaluated at a certain value of 
X = COs( CO) using the recursive relation: 
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for A: = 4 down to 1 

^k = 2-^^i+i - ^k+2 + /(5 - k) 
end 
C(x) = xA^ - /I2 + /(5) / 2, 

with initial values A^ = 1 and Af, = 0. The details of the Chebyshev polynomial evaluation method are found in P. 
Kabal and R.P. Ramachandran [4]. 

5.2.4 LSP to LP conversion (all modes) 

Once the LSPs are quantified and interpolated, they are converted back to the LP coefficient domain jfl^ | . The 
conversion to the LP domain is done as follows. The coefficients of i^i(z) or F2\Z) are found by expanding 
equations (14) and (15) knowing the quantified and interpolated LSPs q^, i = 1, . . . ,10 . The following recursive 
relation is used to compute /j(/) : 

for / = 1 to 5 

/i(/) = -2c72,_i/iO'-l) + 2/ia-2) 
for J = / - 1 down to 1 

/i(j) = /i(j)-2^2,_i/i(j-l) + /i(j-2) 
end 
end 

with initial values /WOj = l and /[ (—lj = 0. The coefficients f2\}) are computed similarly by replacing ^2;-l ^Y 

Once the coefficients f\\i) and f2\i) are found, i^j(z) and F2\z) are multiplied by \+ Z and \— Z , 
respectively, to obtain F{\z) and F2\Z) ', that is: 

/i'(/) = /i(/) + /i(/-l), / = !,.. .,5 

f{{i) = f2{i)-f2{i-l), / = !,.. .,5- ^^^^ 

Finally the LP coefficients are found by: 

_j 0.5/i'(/) + 0.5/2'(/), / = !,.. .,5 

This is directly derived from the relation A[z) = {F^[z) + F2[z)]/2 , and considering the fact that F^\z) and 
F2\Z) are symmetric and anti-symmetric polynomials, respectively. 

5.2.5 Quantization of the LSP coefficients 

12.2 kbit/s mode 

The two sets of LP filter coefficients per frame are quantified using the LSP representation in the frequency domain; 
that is: 

f 
fl^^^arccos[qi), / = !,..., 10, (20) 

2k 
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where /,• are the Hne spectral frequencies (LSF) in Hz [0,4 000] and /^ =8000 is the sampHng frequency. The LSF 
vector is given by f = I /i /2 ■ • • /l I ' ^^'-^ ^ denoting transpose. 

A 1*^' order MA prediction is applied, and the two residual LSF vectors are jointly quantified using split matrix 
quantization (SMQ). The prediction and quantization are performed as ft)llows. Let Z [nj and Z [n) denote the 
mean-removed LSF vectors at frame n . The prediction residual vectors r (n) and r [n) are given by: 



r^^Hn) = z^^Hn)--p{n), and 

r(2)(„) = z(2)(n)-p(n), ^21) 

where p(n) is the predicted LSF vector at frame n . First order moving-average (MA) prediction is used where: 

p{n) = 0.65r^^Hn-l), (22) 

where f [n — l) is the quantified second residual vector at the past frame. 

The two LSF residual vectors r and r are jointly quantified using split matrix quantization (SMQ). The matrix 
I r r I is split into 5 submatrices of dimension 2x2 (two elements from each vector). For example, the first 

submatrix consists of the elements r^ , r2 , Tj , and r2 . The 5 submatrices are quantified with 7, 8, 8h-1, 8, and 
6 bits, respectively. The third submatrix uses a 256-entry signed codebook (8-bit index plus 1-bit sign). 

A weighted LSP distortion measure is used in the quantization process. In general, for an input LSP vector f and a 
quantified vector at index k , t , the quantization is performed by finding the index k which minimizes: 



10 



ELSP=T.[fi^i-fi 

The weighting factors Wi,i — 1,. ..,10 , are given by 



(23) 



w, = 3.347 - ^^ J. for J, < 450, 

450 (24) 

= 1.8 --^(J. -450) otherwise, 
1050 

where dj — /;_|_i — fi_i with /q = and fn— 4000 . Here, two sets of weighting coefficients are computed for the 
two LSF vectors. In the quantization of each submatrix, two weighting coefficients from each set are used with their 
corresponding LSFs. 

10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes 

The set of LP filter coefficients per frame is quantified using the LSP representation in the frequency domain using 
equation (20). 

A 1*^' order MA prediction is applied, and the residual LSF vector is quantified using split vector quantization. The 
prediction and quantization are performed as follows. Let z(n) denote the mean-removed LSF vectors at frame n . 

The prediction residual vectors r(n) is given by: 

r(n)=z(n)-p(n) 

(25) 

where p(n) is the predicted LSF vector at frame n . First order moving-average (MA) prediction is used where: 
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Pj(n)^ajrj{n-l) j=l,...,10, (26) 

where r(n—V) is the quantified residual vector at the past frame and aj is the prediction factor for thejth LSF. 

The LSF residual vectors r is quantified using split vector quantization. The vector r is split into 3 subvectors of 
dimension 3, 3, and 4. The 3 subvectors are quantified with 7-9 bits according to table 2. 

Table 2. Bit allocation split vector quantization of LSF residual vector. 



Mode 


Subvector 1 


Subvector 2 


Subvector 3 


10.2kbit/s 


8 


9 


9 


7.95 kbit/s 


9 


9 


9 


7.40 kbit/s 


8 


9 


9 


6.70 kbit/s 


8 


9 


9 


5.90 kbit/s 


8 


9 


9 


5.15 kbit/s 


8 


8 


7 


4.75 kbit/s 


8 


8 


7 



The weighted LSP distortion measure of equation (23) with the weighting of equation (24) is used in the quantization 
process. 

5.2.6 Interpolation of the LSPs 
12.2 kbit/s mode 

The two sets of quantified (and unquantized) LP parameters are used for the second and fourth subframes whereas the 
first and third subframes use a linear interpolation of the parameters in the adjacent subframes. The interpolation is 

performed on the LSPs in the q domain. Let ^^ be the LSP vector at the 4* subframe of the present frame n , ^2 

be the LSP vector at the 2°'' subframe of the present frame n , and q4 the LSP vector at the 4* subframe of the past 
frame n— 1 . The interpolated LSP vectors at the T' and 3"* subframes are given by: 



qf =0.5q^"^+0.5ql"\ 



(27) 



The interpolated LSP vectors are used to compute a different LP filter at each subframe (both quantified and 
unquantized coefficients) using the LSP to LP conversion method described in clause 5.2.4. 

10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes 

The set of quantified (and unquantized) LP parameters is used for the fourth subframe whereas the first, second, and 
third subframes use a linear interpolation of the parameters in the adjacent subframes. The interpolation is performed on 
the LSPs in the q domain. The interpolated LSP vectors at the V\ 2°'', and 3"* subframes are given by: 

qi"^=0.75ql"-^^+0.25qi"\ 

q['^=05qt'^+0.5q["\ (28) 

q^"*=0.25qi"-^^+0.75ql"\ 

The interpolated LSP vectors are used to compute a different LP filter at each subframe (both quantified and 
unquantized coefficients) using the LSP to LP conversion method described in clause 5.2.4. 

5.2.7 Monitoring resonance in the LPC spectrum (all modes) 

Resonances in the LPC filter are monitored to detect possible problem areas where divergence between the adaptive 
codebook memories in the encoder and the decoder could cause unstable filters in areas with highly correlated 
continuous signals. Typically, this divergence is due to channel errors. 
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<?,.,/ = 1,..., 10 

The monitoring of resonance signals is performed using unquantized LSPs ^^^^^^^^^J. The LSPs are available 
after the LP to LSP conversion in clause 5.2.3. The algorithm utilises the fact that LSPs are closely located at a peak in 



the spectrum. First, two distances, , 



dist. 



dist^ 



and I 



, are calculated in two different regions, defined as 



dist^= ram{q-q^^^),i = 4,...,8 dist 2= Tavi\{q-q^^^),i = 2,3 

^^^^^^^^^^^^^^^^^^^^^ , and ^^^^^^^^^^^^^^^^^^^ 



Either of these two minimum distance conditions must be fulfilled to classify the frame as a resonance frame and 
increase the resonance counter. 



if(dist,< TH,) 


OR 


if (dist 2< 


TH,) 


counter = counter + 1 




else 








counter = 






^^ 



r// 1=0.046 y2| 

is a fixed threshold while the second one is depending on ^^ according to: 




TH, 



0.018, ^2>0-98 
0.024, 0.93 <^ 2 < 0.98 
0.034, otherwise 



12 consecutive resonance frames are needed to indicate possible problem conditions, otherwise the LSP_flag is cleared. 

if {counter > 12) 
counter = 12 



LSP _ flag = I 



else 



LSP _ flag ^0 



5.3 Open-loop pitch analysis 

Open-loop pitch analysis is performed in order to simplify the pitch analysis and confine the closed-loop pitch search to 
a small number of lags around the open-loop estimated lags. 

Open-loop pitch estimation is based on the weighted speech signal S-^^,[n) which is obtained by filtering the input 

speech signal through the weighting filter W[z) = J^iz/Yi ) ^( z/7'2 ) ■ That is, in a subframe of size L , the weighted 
speech is given by: 



10 10 

s^{n) = s{n) + '^aiyls{n - i) -^aiY2S^{n -i), n = 0,...,L-l 
1=1 1=1 



(29) 



12.2 kbit/s mode 

Open-loop pitch analysis is performed twice per frame (each 10 ms) to find two estimates of the pitch lag in each frame. 
Open-loop pitch analysis is performed as follows. In the first step, 3 maxima of the correlation: 
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79 

0;t = Z*w(«K("-^) (30) 

n=0 

are found in the three ranges: 



j = 3: 
i = l 
i = \ 



18,. ..,35, 
36,. ..,71, 

72,..., 143. 



The retained maxima Of. , i—\, . . . ,3 , are normalized by dividing by J 2, ■^w (^ ~ ^i )' ' = 1 ? ■ • • 3 ^ respectively. The 

normalized maxima and corresponding delays are denoted by ( M, , ti 1, / = 1, . . . ,3 . The winner, T^p , among the three 

normalized correlations is selected by favouring the delays with the values in the lower range. This is performed by 
weighting the normalized correlations corresponding to the longer delays. The best open-loop delay 7^„ is determined 

as follows: 

m(t,p) = M, 
ifM2>0.S5M(T^p) 

^op - h 
end 

if M^>0.S5m(t,p) 
m[Top) = M, 



end 



'^op - h 



This procedure of dividing the delay range into 3 clauses and favouring the lower clauses is used to avoid choosing 
pitch multiples. 

10.2 kbit/s mode 

Open-loop pitch analysis is performed twice per frame (every 10 ms) to find two estimates of the pitch lag in each 
frame. 

The open-loop pitch analysis is performed as follows. First, the correlation of weighted speech is determined for each 
pitch lag value d by: 

79 

C{d) = Y,sM'w^f^-d)w{d), J = 20,. ..,143, (31) 

n=0 

where W\d) is a weighting function. The estimated pitch-lag is the delay that maximises the weighted correlation 
function C\d) . The weighting emphasises lower pitch lag values reducing the likelihood of selecting a multiple of the 
correct delay. The weighting function consists of two parts: a low pitch lag emphasis function, W/ ( J) , and a previous 
frame lag neighbouring emphasis function, W^( J) : 

w(j) = W/(j)w„(j). (32) 

The low pitch lag emphasis function is a given by: 
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Wiid) = cwid) (33) 

where cw[d) is defined by a table in the fixed point computational description (ANSI-C code) in [4]. The previous 
frame lag neighbouring emphasis function depends on the pitch lag of previous speech frames: 



^ (J) = Wl^ow -d\ + d^), v> 0.3, 



w„[d) = i ^' ' ^' (34) 

1.0, otherwise, 

where J^ = 20 , T^^i^ is the median filtered pitch lag of 5 previous voiced speech half- frames, and V is an adaptive 
parameter. If the frame is classified as voiced by having the open-loop gain g > 0.4 , the V-value is set to 1.0 for the 
next frame. Otherwise, the V-value is updated by V = 0.9v . The open loop gain is given by: 



79 

n=0 

79 

n=0 



(35) 



where d^^^^ is the pitch delay that maximizes C{d) . The median filter is updated only during voiced speech frames. 
The weighting depends on the reliability of the old pitch lags. If previous frames have contained unvoiced speech or 
silence, the weighting is attenuated through the parameter V. 

7.95, 7.40, 6.70, 5.90 kbit/s modes 

Open-loop pitch analysis is performed twice per frame (each 10 ms) to find two estimates of the pitch lag in each frame. 

Open-loop pitch analysis is performed as follows. In the first step, 3 maxima of the correlation in equation (30) are 
found in the three ranges: 



i = 3: 
i = l 
i = \ 



20,..., 39, 
40,..., 79, 
80,..., 143. 



The retained maxima Of ,Z=1, ...,3 , are normalized by dividing by ,/ /^ 5^(n — ?j), Z = 1,. ..,3 , respectively. The 

normalized maxima and corresponding delays are denoted by (M^ , t^ j, Z = 1, . . . ,3 . The winner, r^„ , among the three 

normalized correlations is selected by favouring the delays with the values in the lower range. This is performed by 
weighting the normalized correlations corresponding to the longer delays. The best open-loop delay 7^„ is determined 

as follows: 



ifM2>0.S5M(T,p) 



Top - h 



end 
zJM3>0.85M(r„^) 



end 



^op - h 
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This procedure of dividing the delay range into 3 clauses and favouring the lower clauses is used to avoid choosing 
pitch multiples. 

5.15, 4.75 kbit/s modes 

Open-loop pitch analysis is performed once per frame (each 20 ms) to find an estimate of the pitch lag in each frame. 

Open-loop pitch analysis is performed as follows. In the first step, 3 maxima of the correlation in equation (30) are 
found in the three ranges: 



i = 3: 


20,. 


..,39, 


i = l: 


40,. 


..,79, 


i = \: 


79,.. 


.,143. 



The retained maxima Of ,Z=1, ...,3 , are normalized by dividing by a/ /^ S^in — ti\ Z = 1,. ..,3 , respectively. The 

normalized maxima and corresponding delays are denoted by (M^ , t^ j, Z = 1, . . . ,3 . The winner, T^p , among the three 

normalized correlations is selected by favouring the delays with the values in the lower range. This is performed by 
weighting the normalized correlations corresponding to the longer delays. The best open-loop delay 7^„ is determined 

as follows: 



ifM2>Q.S5M(T,p) 



Top - h 



end 

if M^>0.S5m(t^p) 



end 



^op - h 



This procedure of dividing the delay range into 3 clauses and favouring the lower clauses is used to avoid choosing 
pitch multiples. 

5.4 Impulse response computation (all modes) 

The impulse response, h\n) , of the weighted synthesis filter H{z)W[z) = A[z/^j j/ A( 2)^(^2/^2) i^ computed 
each subframe. This impulse response is needed for the search of adaptive and fixed codebooks. The impulse response 
h\n) is computed by filtering the vector of coefficients of the filter A\zIYy ) extended by zeros through the two filters 

l/A(z)andl/A(z/72). 

5.5 Target signal computation (all modes) 

The target signal for adaptive codebook search is usually computed by subtracting the zero input response of the 

weighted synthesis filter H{z)W{z) = A[z/ /i) /\A{z) A[z/ y2)\ ^o"^ *^ weighted speech signal S^{n) . This is 
performed on a subframe basis. 
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An equivalent procedure for computing the target signal, which is used in the present document, is the filtering of the 
LP residual signal reSi^pitl) through the combination of synthesis filter Xj A\z) and the weighting filter 

Ayzj y-^ )/ ^\^l Yl ) ■ After determining the excitation for the subframe, the initial states of these filters are updated by 
filtering the difference between the LP residual and excitation. The memory update of these filters is explained in clause 
5.9. 

The residual signal reSj^p{n) which is needed for finding the target vector is also used in the adaptive codebook 

search to extend the past excitation buffer. This simplifies the adaptive codebook search procedure for delays less than 
the subframe size of 40 as will be explained in the next clause. The LP residual is given by: 

10 
resi^p{n) = s{n)-<r'^ais{n-i). (36) 

5.6 Adaptive codebook 
5.6.1 Adaptive codebook searcii 



Adaptive codebook search is performed on a subframe basis. It consists of performing closed-loop pitch search, and 
then computing the adaptive codevector by interpolating the past excitation at the selected fractional pitch lag. 

The adaptive codebook parameters (or pitch parameters) are the delay and gain of the pitch filter. In the adaptive 
codebook approach for implementing the pitch filter, the excitation is repeated for delays less than the subframe length. 
In the search stage, the excitation is extended by the LP residual to simplify the closed-loop search. 

12.2 kbit/s mode 

In the first and third subframes, a fractional pitch delay is used with resolutions: 1/6 in the range [17 3/6,94 3/ 6j and 
integers only in the range [95, 143]. For the second and fourth subframes, a pitch resolution of 1/6 is always used in the 
range [7j —5 3/6 ,7j +A 3/6j , where Ty is nearest integer to the fractional pitch lag of the previous ( 1 '"' or 3"*) 
subframe, bounded by 18. ..143. 

Closed-loop pitch analysis is performed around the open-loop pitch estimates on a subframe basis. In the first (and 
third) subframe the range 7^„ i3 , bounded by 18... 143, is searched. For the other subframes, closed-loop pitch 

analysis is performed around the integer pitch selected in the previous subframe, as described above. The pitch delay is 
encoded with 9 bits in the first and third subframes and the relative delay of the other subframes is encoded with 6 bits. 

The closed-loop pitch search is performed by minimizing the mean-square weighted error between the original and 
synthesized speech. This is achieved by maximizing the term: 

m= il; (37) 

where x{n) is the target signal and J^ [n) is the past filtered excitation at delay k (past excitation convolved with 
h[n) ). Note that the search range is limited around the open-loop pitch as explained earlier. 

The convolution y^, [n) is computed for the first delay t^^^^ in the searched range, and for the other delays in the 
search range k = ?„:„ -I- 1, ... , tj^„„ , it is updated using the recursive relation: 



yj,{n) = Jk-M - 1) + u{- k)h{n) , 1 < n < 39 (38) 
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and )^^(0) = u(-k)h(0) , where u{n),n = -(l43+ ll),...,39 , is the excitation buffer. Note that in search stage, 
the samples u[n),n = 0, . . . ,39 , are not known, and they are needed for pitch delays less than 40. To simplify the 
search, the LP residual is copied to u[n) in order to make the relation in equation (38) valid for all delays. 

Once the optimum integer pitch delay is determined, the fractions from -3/6 to 3/6 with a step of 1/6 around that integer 
are tested. The fractional pitch search is performed by interpolating the normalized correlation in equation (37) and 
searching for its maximum. The interpolation is performed using an FIR filter Z?24 based on a Hamming windowed 

sin(^)/x function truncated at ± 23 and padded with zeros at ± 24 (^^24 (24) = ). The filter has its cut-off frequency 
(-3 dB) at 3 600 Hz in the over-sampled domain. The interpolated values of R{k) for the fractions -3/6 to 3/6 are 
obtained using the interpolation formula: 

3 3 

R{k)^ = Y,R{k-i)b24{t + i-6) + Y,R{k + 'i^ + i)b24{6-t + i-6), t = 0,...,5, (39) 

i=0 i=0 

where t — 0,...,5 corresponds to the fractions 0, 1/6, 2/6, 3/6, -2/6, and -1/6, respectively. Note that it is necessary to 
compute the correlation terms in equation (37) using a range t^^^^ — 4, t^^-^ + 4, to allow for the proper interpolation. 

Once the fractional pitch lag is determined, the adaptive codebook vector v{n) is computed by interpolating the past 
excitation signal U\n) at the given integer delay k and phase (fraction) t : 

9 9 

v{n) = '^u{n-k-i)b(,Q{t + i-6) + 'Y^u{n-k + l + i)bf,Q{6-t + i-6), n = 0,...,39, t = 0,...,5. (40) 

The interpolation filter b(^Q is based on a Hamming windowed sin( Jc)/JC function truncated at ± 59 and padded with 
zeros at ± 60 ( b(^Q (60) = ). The filter has a cut-off frequency (-3 dB) at 3 600 Hz in the over-sampled domain. 

The adaptive codebook gain is then found by: 

g,=^ , bounded by 0<^^<1.2 (41) 

where y{n) = v{n)* h{n) is the filtered adaptive codebook vector (zero state response of H[z)W{z) to v{n) ). 
The computed adaptive codebook gain is quantified using 4-bit non-uniform scalar quantization in the range [0.0,1.2]. 

7.95 kbit/s mode 

In the first and third subframes, a fractional pitch delay is used with resolutions: 1/3 in the range [19 1/3 ,84 2/3j and 
integers only in the range [85, 143]. For the second and fourth subframes, a pitch resolution of 1/3 is always used in the 

range \T^ — 10 2/3 ,T^ +9 2/31 , where 7]^ is nearest integer to the fractional pitch lag of the previous (T' or 3'^'') 
subframe, bounded by 20. ..143. 

Closed-loop pitch analysis is performed around the open-loop pitch estimates on a subframe basis. In the first (and 
third) subframe the range T + 3 , bounded by 20... 143, is searched. For the other subframes, closed-loop pitch 

analysis is performed around the integer pitch selected in the previous subframe, as described above. The pitch delay is 
encoded with 8 bits in the first and third subframes and the relative delay of the other subframes is encoded with 6 bits. 

The closed-loop pitch search is performed by minimizing the mean-square weighted error between the original and 
synthesized speech. This is achieved by maximizing the term of equation (37). Note that the search range is limited 
around the open-loop pitch as explained earlier. 
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The convolution yj^ [n) is computed for the first delay t^^^^ in the searched range, and for the other delays in the 
search range k = t^^^ + 1, . . . , ^jj^^x ' i*- i^ updated using the recursive relation of equation (38). 

Once the optimum integer pitch delay is determined, the fractions from -2/3 to 2/3 with a step of 1/3 around that integer 
are tested. The fractional pitch search is performed by interpolating the normalized correlation in equation (37) and 

searching for its maximum. Once the fractional pitch lag is determined, the adaptive codebook vector v{n) is 

computed by interpolating the past excitation signal u[n) at the given integer delay and phase (fraction). The 
interpolation is performed using two FIR filters (Hamming windowed sine functions); one for interpolating the term in 
equation (37) with the sine truncated at ± 11 and the other for interpolating the past excitation with the sine truncated at 
± 29. The filters have their cut-off frequency (-3 dB) at 3 600 Hz in the over-sampled domain. 

The adaptive codebook gain is then found as in equation (41). 

The computed adaptive codebook gain is quantified using 4-bit non-uniform scalar quantization as described in clause 
5.8. 

10.2, 7.40 kbit/s mode 

In the first and third subframes, a fractional pitch delay is used with resolutions: 1/3 in the range 119 1/3 ,84 2/31 and 
integers only in the range [85, 143]. For the second and fourth subframes, a pitch resolution of 1/3 is always used in the 

range I Tj — 5 2/3 ,7^1+4 2/31 , where 7]^ is nearest integer to the fractional pitch lag of the previous (T' or 3'^'') 
subframe, bounded by 20. ..143. 

Closed-loop pitch analysis is performed around the open-loop pitch estimates on a subframe basis. In the first (and 
third) subframe the range T + 3 , bounded by 20... 143, is searched. For the other subframes, closed-loop pitch 

analysis is performed around the integer pitch selected in the previous subframe, as described above. The pitch delay is 
encoded with 8 bits in the first and third subframes and the relative delay of the other subframes is encoded with 5 bits. 

The closed-loop pitch search is performed by minimizing the mean-square weighted error between the original and 
synthesized speech. This is achieved by maximizing the term of equation (37). Note that the search range is limited 
around the open-loop pitch as explained earlier. 

The convolution yj^in) is computed for the first delay t^^^^ in the searched range, and for the other delays in the 
search range k = t^^^^ + 1, . . . , ^jjj^x ' i*- i^ updated using the recursive relation of equation (38). 

Once the optimum integer pitch delay is determined, the fractions from -2/3 to 2/3 with a step of 1/3 around that integer 
are tested. The fractional pitch search is performed by interpolating the normalized correlation in equation (37) and 

searching for its maximum. Once the fractional pitch lag is determined, the adaptive codebook vector v[n) is 

computed by interpolating the past excitation signal u{n) at the given integer delay and phase (fraction). The 
interpolation is performed using two FIR filters (Hamming windowed sine functions); one for interpolating the term in 
equation (37) with the sine truncated at ± 1 1 and the other for interpolating the past excitation with the sine truncated at 
± 29. The filters have their cut-off frequency (-3 dB) at 3 600 Hz in the over-sampled domain. 

The adaptive codebook gain is then found as in equation (41). 

The computed adaptive codebook gain (and the fixed codebook gain) is quantified using 7-bit non-uniform vector 
quantization as described in clause 5.8. 

6.70, 5.90 kbit/s modes 

In the first and third subframes, a fractional pitch delay is used with resolutions: 1/3 in the range 119 1/3 ,84 2/3 1 and 
integers only in the range [85, 143]. For the second and fourth subframes, integer pitch resolution is used in the range 
I 7J — 5, Tj + 4 1 , where 7]^ is nearest integer to the fractional pitch lag of the previous (T' or 3"*) subframe, bounded 

by 20... 143. Additionally, a fractional resolution of 1/3 is used in the range I 7] — 1 2/3 , T^ + 2/3 1 . 
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Closed-loop pitch analysis is performed around the open-loop pitch estimates on a subframe basis. In the first (and 
third) subframe the range T + 3 , bounded by 20... 143, is searched. For the other subframes, closed-loop pitch 

analysis is performed around the integer pitch selected in the previous subframe, as described above. The pitch delay is 
encoded with 8 bits in the first and third subframes and the relative delay of the other subframes is encoded with 4 bits. 

The closed-loop pitch search is performed by minimizing the mean-square weighted error between the original and 
synthesized speech. This is achieved by maximizing the term of equation (37). Note that the search range is limited 
around the open-loop pitch as explained earlier. 

The convolution y^ yn) is computed for the first delay t-^^-^ in the searched range, and for the other delays in the 
search range k — t^^^ + 1, . . . , ?niax ' i*- ^^ updated using the recursive relation of equation (38). 

Once the optimum integer pitch delay is determined, the fractions from -2/3 to 2/3 with a step of 1/3 around that integer 
are tested. The fractional pitch search is performed by interpolating the normalized correlation in equation (37) and 

searching for its maximum. Once the fractional pitch lag is determined, the adaptive codebook vector V\n) is 

computed by interpolating the past excitation signal U\n) at the given integer delay and phase (fraction). The 
interpolation is performed using two FIR filters (Hamming windowed sine functions); one for interpolating the term in 
equation (37) with the sine truncated at ± 1 1 and the other for interpolating the past excitation with the sine truncated at 
± 29. The filters have their cut-off frequency (-3 dB) at 3 600 Hz in the over-sampled domain. 

The adaptive codebook gain is then found as in equation (41). 

The computed adaptive codebook gain (and the fixed codebook gain) is quantified using vector quantization as 
described in clause 5.8. 

5.15, 4.75 kbit/s modes 

In the first subframe, a fractional pitch delay is used with resolutions: 1/3 in the range 119 1/3,84 2/31 and integers 
only in the range [85, 143]. For the second, third, and fourth subframes, integer pitch resolution is used in the range 
I Z| — 5, Tj + 4 1 , where Ty is nearest integer to the fractional pitch lag of the previous subframe, bounded by 20... 143. 

Additionally, a fractional resolution of 1/3 is used in the range I 7j — 1 2/3 , TJ + 2/31 . 

Closed-loop pitch analysis is performed around the open-loop pitch estimates on a subframe basis. In the first subframe 
the range Top ± 5, bounded by 20... 143, is searched. For the other subframes, closed-loop pitch analysis is performed 
around the integer pitch selected in the previous subframe, as described above. The pitch delay is encoded with 8 bits in 
the first subframe and the relative delay of the other subframes is encoded with 4 bits. 

The closed-loop pitch search is performed by minimizing the mean-square weighted error between the original and 
synthesized speech. This is achieved by maximizing the term of equation (37). Note that the search range is limited 
around the open-loop pitch as explained earlier. 

The convolution y]^\n) is computed for the first delay t-^^-^ in the searched range, and for the other delays in the 
search range k — t^^ + 1, . . . , ^^lax ' i*- i^ updated using the recursive relation of equation (38). 

Once the optimum integer pitch delay is determined, the fractions from -2/3 to 2/3 with a step of 1/3 around that integer 
are tested. The fractional pitch search is performed by interpolating the normalized correlation in equation (37) and 

searching for its maximum. Once the fractional pitch lag is determined, the adaptive codebook vector V\n) is 

computed by interpolating the past excitation signal U\n) at the given integer delay and phase (fraction). The 
interpolation is performed using two FIR filters (Hamming windowed sine functions); one for interpolating the term in 
equation (37) with the sine truncated at ± 1 1 and the other for interpolating the past excitation with the sine truncated at 
± 29. The filters have their cut-off frequency (-3 dB) at 3 600 Hz in the over-sampled domain. 

The adaptive codebook gain is then found as in equation (41). 

The computed adaptive codebook gain (and the fixed codebook gain) is quantified using vector quantization as 
described in clause 5.8. 
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5.6.2 Adaptive codebook gain control (all modes) 



The average adaptive codebook gain is calculated if the LSPJlag is set and the unquantized adaptive codebook gain 
exceeds the gain threshold . 



GP„= 0.95 



The average gain is calculated from the present unquantized gain and the quantized gains of the seven previous 



subframes. That is. 



where n is the current subframe. 



GP, 



If the average adaptive codebook gain exceeds the ^^^, the unquantized gain is limited to the threshold value and the 
GpCJlag is set to indicate the limitation. 

if {GP ^,,^> GP J 

gp= GP„ 

GpC_ flag = 1 
else 

GpC_ flag = 



The GpCJlag is used in the gain quantization in clause 5.8. 

5.7 Algebraic codebook 
5.7.1 Algebraic codebook structure 

The algebraic codebook structure is based on interleaved single-pulse permutation (ISPP) design. 

12.2 kbit/s mode 

In this codebook, the innovation vector contains 10 non-zero pulses. All pulses can have the amplitudes +\ or -1. The 
40 positions in a subframe are divided into 5 tracks, where each track contains two pulses, as shown in table 3. 

Table 3: Potential positions of individual pulses in the algebraic codebook, 12.2 kbit/s. 



Track 


Pulse 


Positions 


1 


io. 15 


0,5, 10, 15,20,25,30,35 


2 


il.i6 


1,6, 11, 16,21, 26,31, 36 


3 


12,17 


2,7, 12, 17,22,27,32,37 


4 


i3>i8 


3,8, 13, 18,23,28,33,38 


5 


14,19 


4,9, 14, 19,24,29,34,39 



Each two pulse positions in one track are encoded with 6 bits (total of 30 bits, 3 bits for the position of every pulse), and 
the sign of the first pulse in the track is encoded with 1 bit (total of 5 bits). 

For two pulses located in the same track, only one sign bit is needed. This sign bit indicates the sign of the first pulse. 
The sign of the second pulse depends on its position relative to the first pulse. If the position of the second pulse is 
smaller, then it has opposite sign, otherwise it has the same sign than in the first pulse. 

All the 3-bit pulse positions are Gray coded in order to improve robustness against channel errors. This gives a total of 
35 bits for the algebraic code. 

10.2 kbit/s mode 

In this codebook, the innovation vector contains 8 non-zero pulses. All pulses can have the amplitudes +\ or -1. The 40 
positions in a subframe are divided into 4 tracks, where each track contains two pulses, as shown in table 4. 
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Table 4: Potential positions of individual pulses in the algebraic codebook, 10.2 kbit/s. 



Track 


Pulse 


Positions 


1 


i0>i4 


0,4,8, 12, 16,20,24,28,32,36 


2 


il,i5 


1,5,9, 13, 17,21,25,29,33,37 


3 


12.16 


2, 6, 10, 14, 18,22, 26,30,34,38 


4 


13.17 


3,7, 11, 15, 19,23,27,31,35,39 



The pulses are grouped into 3, 3, and 2 pulses and their positions are encoded with 10, 10, and 7 bits, respectively (total 
of 27 bits). The sign of the first pulse in each track is encoded with 1 bit (total of 4 bits). 

For two pulses located in the same track, only one sign bit is needed. This sign bit indicates the sign of the first pulse. 
The sign of the second pulse depends on its position relative to the first pulse. If the position of the second pulse is 
smaller, then it has opposite sign, otherwise it has the same sign than in the first pulse. 

This gives a total of 3 1 bits for the algebraic code. 

7.95, 7.40 kbit/s modes 

In this codebook, the innovation vector contains 4 non-zero pulses. All pulses can have the amplitudes +1 or -1. The 40 
positions in a subframe are divided into 4 tracks, where each track contains one pulse, as shown in table 5. 

Table 5: Potential positions of individual pulses in the algebraic codebook, 7.95, 7.40 kbit/s. 



Track 


Pulse 


Positions 


1 


io 


0,5, 10, 15,20,25,30,35 


2 


il 


1,6, 11, 16,21,26,31,36 


3 


12 


2,7, 12, 17,22,27,32,37 


4 


13 


3.8, 13, 18,23,28,33,38, 

4.9, 14, 19,24,29,34, 39 



The pulse positions are encoded with 3, 3, 3, and 4 bits (total of 13 bits), and the sign of the each pulse is encoded with 
1 bit (total of 4 bits). This gives a total of 17 bits for the algebraic code. 

6.70 kbit/s mode 

In this codebook, the innovation vector contains 3 non-zero pulses. All pulses can have the amplitudes +1 or -1. The 40 
positions in a subframe are divided into 3 tracks, where each track contains one pulse, as shown in table 6. 

Table 6: Potential positions of individual pulses in the algebraic codebook, 6.70 kbit/s. 



Track 


Pulse 


Positions 


1 


io 


0,5, 10, 15,20,25,30,35 


2 


il 


1,6, 11, 16,21,26,31,36, 
3,8, 13, 18,23,28,33,38 


3 


i2 


2,7, 12, 17,22,27,32,37, 
4,9, 14, 19,24,29,34,39 



The pulse positions are encoded with 3, 4, and 4 bits (total of 1 1 bits), and the sign of the each pulse is encoded with 
1 bit (total of 3 bits). This gives a total of 14 bits for the algebraic code. 

5.90 kbit/s mode 

In this codebook, the innovation vector contains 2 non-zero pulses. All pulses can have the amplitudes +1 or -1. The 40 
positions in a subframe are divided into 2 tracks, where each track contains one pulse, as shown in table 7. 
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Table 7: Potential positions of individual pulses in the algebraic codebook, 5.90 kbit/s. 



Track 


Pulse 


Positions 


1 


io 


1,6, 11, 16,21,26,31,36, 
3,8, 13, 18,23,28,33,38 


2 


il 


0,5, 10, 15,20,25,30,35, 
1, 6, 11, 16, 21,26, 31,36, 
2,7, 12, 17,22,27,32,37, 
4,9, 14, 19,24,29,34,39 



The pulse positions are encoded with 4 and 5 bits (total of 9 bits), and the sign of the each pulse is encoded with 1 bit 
(total of 2 bits). This gives a total of 11 bits for the algebraic code. 

5.15, 4.75 kbit/s modes 

In this codebook, the innovation vector contains 2 non-zero pulses. All pulses can have the amplitudes +1 or -1. The 40 
positions in a subframe are divided into 5 tracks. Two subsets of 2 tracks each are used for each subframe with one 
pulse in each track. Different subsets of tracks are used for each subframe. The pulse positions used in each subframe 
are shown in table 8. 

Table 8: Potential positions of individual pulses in the algebraic codebook, 5.15, 4.75 kbit/s. 



Subframe 


Subset 


Pulse 


Positions 


1 


1 


io 


0,5, 10, 15,20,25,30,35 


H 


2,7, 12, 17,22,27,32,37 


2 


io 


1, 6, 11, 16, 21,26, 31,36 


ii 


3,8, 13, 18,23,28,33,38 


2 


1 


io 


0,5, 10, 15,20,25,30,35 


ii 


3,8, 13, 18,23,28,33,38 


2 


io 


2,7, 12, 17,22,27,32,37 


ii 


4,9, 14, 19,24,29,34,39 


3 


1 


io 


0,5, 10, 15,20,25,30,35 


ii 


2,7, 12, 17,22,27,32,37 


2 


io 


1,6, 11, 16,21,26,31,36 


ii 


4, 9, 14, 19, 24,29, 34,39 


4 


1 


io 


0,5, 10, 15,20,25,30,35 


ii 


3,8, 13, 18,23,28,33,38 


2 


io 


1, 6, 11, 16, 21,26, 31,36 


ii 


4,9, 14, 19, 24,29, 34,39 



One bit is needed to encoded the subset used. The two pulse positions are encoded with 3 bits each (total of 6 bits), and 
the sign of the each pulse is encoded with 1 bit (total of 2 bits). This gives a total of 9 bits for the algebraic code. 



5.7.2 Algebraic codebook search 

The algebraic codebook is searched by minimizing the mean square error between the weighted input speech and the 
weighted synthesized speech. The target signal used in the closed-loop pitch search is updated by subtracting the 
adaptive codebook contribution. That is: 



X2{n) = x{n)-g y{n), n = 0,...,39 



(42) 



where y{n) = v{n)* h{n) is the filtered adaptive codebook vector and g „ is the quantified adaptive codebook gain. 
If Cj^ is the algebraic codevector at index k , then the algebraic codebook is searched by maximizing the term: 

\2 






(43) 
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where d = H X^ is the correlation between the target signal xAn) and the impulse response h(n), H is a the 
lower triangular Toepliz convolution matrix with diagonal ^(Oj and lower diagonals h\l), ..., /?(39j , and 

O = H H is the matrix of correlations of hinj . The vector d (backward filtered target) and the matrix O are 
computed prior to the codebook search. The elements of the vector d are computed by 

39 
d{n) = 2_,X2{n)h{i-n), n = 0, ..,39, (44) 

i=n 

and the elements of the symmetric matrix O are computed by: 

39 
(l)[i,i)^Y.h{n-i)h{n-i), [j>i). (45) 

n=j 

The algebraic structure of the codebooks allows for very fast search procedures since the innovation vector C]^ contains 
only a few nonzero pulses. The correlation in the numerator of Equation (43) is given by: 

C= ^Z^.JK), (46) 

i=Q 

where m,- is the position of the / th pulse, 'd^ is its amplitude, and N „ is the number of pulses ( A^„ =10). The 
energy in the denominator of equation (43) is given by: 

Np-\ Np-2 Np-\ 

Ed=Y. <P('ni,mi) + 2 ^ j;z^,.z^^.^(m.,mp. (47) 

i=0 i = j=i+l 

To simplify the search procedure, the pulse amplitudes are preset by the mere quantization of an appropriate signal 
b\n) . This is simply done by setting the amplitude of a pulse at a certain position equal to the sign of b{n) at that 
position. The simplification proceeds as follows (prior to the codebook search). First, the sign signal 

Si,(n) = sign[b(n)] and the signal d (n) = d(n)siy(n) are computed. Second, the matrix O is modified by 
including the sign information; that is, ^ (i,j) = i'^ (O'^fo (j)0(Uj) ■ The correlation in equation (46) is now given by: 

Np-i 



C= Y.d'(mi) (48) 



1=0 
and the energy in equation (47) is given by: 



Ed^Y.^ (m, ,m, ) + 2 ^ J; ^ (m, ,m . ). (49) 



12.2 kbit/s mode 



In this case the signal b[n) , used for presetting the amplitudes, is a sum of the normalized d{n) vector and 
normalized long-term prediction residual reSup(n) : 

U„\- resi^TpJn) d{n) 

b[n)--, — -^ I 39 ^' n-0,...,39, (50) 
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is used. Having preset the pulse amplitudes, as explained above, the optimal pulse positions are determined using an 
efficient non-exhaustive analysis-by-synthesis search technique. In this technique, the term in equation (43) is tested for 
a small percentage of position combinations. 

First, for each of the five tracks the pulse positions with maximum absolute values of b[n) are searched. From these 
the global maximum value for all the pulse positions is selected. The first pulse ig is always set into the position 
corresponding to the global maximum value. 

Next, four iterations are carried out. During each iteration the position of pulse ii is set to the local maximum of one 
track. The rest of the pulses are searched in pairs by sequentially searching each of the pulse pairs {i2,i3}, {i4,i5}, {ie,!?} 
and {i8,i9} in nested loops. Every pulse has 8 possible positions, i.e., there are four 8x8-loops, resulting in 256 different 
combinations of pulse positions for each iteration. 

In each iteration all the 9 pulse starting positions are cyclically shifted, so that the pulse pairs are changed and the pulse 
ii is placed in a local maximum of a different track. The rest of the pulses are searched also for the other positions in the 
tracks. At least one pulse is located in a position corresponding to the global maximum and one pulse is located in a 
position corresponding to one of the 4 local maxima. 

A special feature incorporated in the codebook is that the selected codevector is filtered through an adaptive pre-filter 
F^^iz) which enhances special spectral components in order to improve the synthesized speech quality. Here the filter 

^^(z) = 1/(1 ~ /^Z ) is used, where T is the nearest integer pitch lag to the closed-loop fractional pitch lag of the 

subframe, and |3 is a pitch gain. In the present document, |3 is given by the quantified pitch gain bounded by [0.0,1.0]. 

Note that prior to the codebook search, the impulse response h[n) must include the pre-filter F^{z) ■ That is, 

h{n) = h{n)-j3h{n-T), n = T,.. .,39. 

The fixed codebook gain is then found by: 

Sc=^ (51) 

z z 

where X2 is the target vector for fixed codebook search and Z is the fixed codebook vector convolved with h{n) , 

n 

z(n) = ^c(j)/i(n-z), n = 0,...,39. (52) 

10.2 kbit/s mode 

In this case the signal b\n) , used for presetting the amplitudes, is given by eq. (50). Having preset the pulse 
amplitudes, as explained above, the optimal pulse positions are determined using an efficient non-exhaustive 
analysis-by-synthesis search technique. In this technique, the term in equation (43) is tested for a small percentage of 
position combinations. 

A special feature incorporated in the codebook is that the selected codevector is filtered through an adaptive pre-filter 
F^{z) which enhances special spectral components in order to improve the synthesized speech quality. Here the filter 

Fg(z) = 1/(1 — /3z ) is used, where T is the nearest integer pitch lag to the closed-loop fractional pitch lag of the 
subframe, and |3 is a pitch gain. In the present document, (3 is given by the quantified pitch gain bounded by [0.0,0.8]. 
Note that prior to the codebook search, the impulse response h{n) must include the pre-filter F^(z) . That is, 
h{n) = h{n)-jBh{n-T), n = T,.. .,39. 

The fixed codebook gain is then found by equation (51). 

7.95, 7.40 kbit/s modes 

In this case the signa\b\n) , used for presetting the amplitudes, is equal to the signal d{n) . Having preset the pulse 
amplitudes, as explained above, the optimal pulse positions are determined using an efficient non-exhaustive 
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analysis-by-synthesis search technique. In this technique, the term in equation (43) is tested for a small percentage of 
position combinations. 

A special feature incorporated in the codebook is that the selected codevector is filtered through an adaptive pre-filter 
Fi^iz) which enhances special spectral components in order to improve the synthesized speech quality. Here the filter 

Fgiz) = 1/(1 — /3z ) is used, where T is the nearest integer pitch lag to the closed-loop fractional pitch lag of the 
subframe, and (3 is a pitch gain. In the present document, (3 is given by the quantified pitch gain bounded by [0.0,0.8]. 
Note that prior to the codebook search, the impulse response h{n) must include the pre-filter F^(z) ■ That is, 
h{n) = h{n)-j8h{n-T), n = T,.. .,39. 

The fixed codebook gain is then found by equation (51). 

6.70 kbit/s mode 

In this case the signal b{n) , used for presetting the amplitudes, is equal to the signal d[n) . Having preset the pulse 
amplitudes, as explained above, the optimal pulse positions are determined using an efficient non-exhaustive 
analysis-by-synthesis search technique. In this technique, the term in equation (43) is tested for a small percentage of 
position combinations. 

A special feature incorporated in the codebook is that the selected codevector is filtered through an adaptive pre-filter 
F^(z) which enhances special spectral components in order to improve the synthesized speech quality. Here the filter 

Fg(z) = 1/(1 — /3z ) is used, where T is the nearest integer pitch lag to the closed-loop fractional pitch lag of the 
subframe, and (3 is a pitch gain. In the present document, (3 is given by the quantified pitch gain bounded by [0.0,0.8]. 
Note that prior to the codebook search, the impulse response h{n) must include the pre-filter F^(z) ■ That is, 
h{n) = h{n)-j3h{n-T), n = T,.. .,39. 

The fixed codebook gain is then found by equation (51). 

5.90 kbit/s mode 

In this case the signal b{n) , used for presetting the amplitudes, is equal to the signal d[n) . Having preset the pulse 
amplitudes, as explained above, the optimal pulse positions are determined using an exhaustive analysis-by-synthesis 
search technique. 

A special feature incorporated in the codebook is that the selected codevector is filtered through an adaptive pre-filter 
F^(z) which enhances special spectral components in order to improve the synthesized speech quality. Here the filter 

Fgiz) = 1/(1 — /3z ) is used, where T is the nearest integer pitch lag to the closed-loop fractional pitch lag of the 

subframe, and |3 is a pitch gain. In the present document, |3 is given by the quantified pitch gain bounded by [0.0,0.8]. 

Note that prior to the codebook search, the impulse response h{n) must include the pre-filter F^{z) ■ That is, 

h{n) = h{n)-j8h{n-T), n = T,... ,39. 

The fixed codebook gain is then found by equation (51). 

5.15, 4.75 kbit/s modes 

In this case the signal b\n) , used for presetting the amplitudes, is equal to the signal d\n) . Having preset the pulse 
amplitudes, as explained above, the optimal pulse positions are determined using an exhaustive analysis-by-synthesis 
search technique. Note that both subsets are searched. 

A special feature incorporated in the codebook is that the selected codevector is filtered through an adaptive pre-filter 
F^(z) which enhances special spectral components in order to improve the synthesized speech quality. Here the filter 

Fgiz) = 1/(1 — /3z ) is used, where T is the nearest integer pitch lag to the closed-loop fractional pitch lag of the 
subframe, and (3 is a pitch gain. In the present document, |3 is given by the quantified pitch gain bounded by [0.0,0.8]. 
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Note that prior to the codebook search, the impulse response h\n) must include the pre-filter F^(z) ■ That is, 
h{n) = h{n)-j3h{n-T), n = T,.. .,39. 

The fixed codebook gain is then found by equation (51). 

5.8 Quantization of the adaptive and fixed codebook gains 
5.8.1 Adaptive codebook gain limitation in quantization 

If the GpCJlag is set, the limited adaptive codebook gain is used in the gain quantization in clause 5.8.2. The 

gpA 



quantization codebook search range is limited to only include adaptive codebook gain values less than ^^^J. This is 
performed in the quantization search for all modes. 

5.8.2 Quantization of codebook gains 

Prediction of the fixed codebook gain (all modes) 

The fixed codebook gain quantization is performed using MA prediction with fixed coefficients. The 4* order MA 
prediction is performed on the innovation energy as follows. Let E\n) be the mean-removed innovation energy (in 
dB) at subframe n , and given by: 



( 1 '^"' 
£(n) = 101og -glY.'^'ii) 



f 1 N-\ \ 

-E , (53) 



where A^ = 40 is the subframe size, C\l) is the fixed codebook excitation, and K (in dB) is the mean of the 
innovation energy. The predicted energy is given by: 

4 
£(n) = X^,.^(n-/), (54) 

i=\ 

where [Z?[ Z>2^^4] = [0.68 0.58 0.34 0.19] are the MA prediction coefficients, and R{k) is the quantified 

prediction error at subframe h . The predicted energy is used to compute a predicted fixed-codebook gain g^ as in 

equation (53) (by substituting E,\n) by E\n) and g^ by ^^, ). This is done as follows. First, the mean innovation 
energy is found by: 



f 
£/ = 101og 



N-\ 



1 JV— 1 



V^,=0 



(55) 



and then the predicted gain g^ is found by: 

^,^^q0.05(£(«)+£-£,)^ (56) 

A correction factor between the gain g^. and the estimated one g^. is given by: 

ygc=gclg'c- (57) 

Note that the prediction error is given by: 

R{n) = E{n)-E{n) = lQ\og{Y gc)- (58) 
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12.2 kbit/s mode 

The correction factor y„^, is computed using a mean energy value, E = 36 dB. The correction factor y„^ is 
quantified using a 5-bit codebook. The quantization table search is performed by minimizing the error: 



EQ=[gc-fgcgc) ■ 



(59) 



Once the optimum value y„^. is chosen, the quantified fixed codebook gain is given by g^ — jocSc ■ 

10.2 kbit/s mode 

The correction factor J „^ is computed using a mean energy value, E =33 dB. The adaptive codebook gain g ^ and 
the correction factor y „^ are jointly vector quantized using a 7-bit codebook. The gain codebook search is performed 
by minimizing equation (63). 

7.95 kbit/s mode 

The correction factor y „^ is computed using a mean energy value, E = 36 dB. The same scalar codebooks as for the 
12.2 kbit/s mode is used for quantization of the adaptive codebook gain g „ and the correction factor y„^ . The search 

of the codebooks starts with finding 3 candidates for the adaptive codebook gain. These candidates are the best 
codebook value in scalar quantization and the two adjacent codebook values. These 3 candidates are searched together 
with the correction factor codebook minimizing the term of equation (63). 

An adaptor based on the coding gain in the adaptive codebook decides if the coding gain is low. If this is the case, the 
correction factor codebook is searched once more minimizing a modified criterion in order to find a new quantized 
fixed codebook gain. The modified criterion is given by: 

^mod =(!-«) ■I|cp-(^c- rgc-^c) +Cc\4eZ-4^c] (60) 

where E^^^ and E^^^ are the energy (the squared norm) of the LP residual and the total excitation, respectively. The 

criterion is searched with the already quantized adaptive codebook gain and the correction factor y^^ that minimizes 

(60) is selected. The balance factor OC decides the amount of energy matching in the modified criterion. This factor is 
adaptively decided based on the coding gain in the adaptive codebook as computed by: 

II l|2 

ll^es^pll 
ag = lQ-\og^Q- -y. (61) 

|res^p - v| 

If the coding gain ag is less than 1 dB, the modified criterion is employed, except when an onset is detected. An onset 
is said to be detected if the fixed codebook gain in the current subframe is more than twice the value of the fixed 
codebook gain in the previous subframe. A hangover of 8 subframes is used in the onset detection so that the modified 
criterion is not used for the next 7 subframes either if an onset is detected. The balance factor OC is computed from the 
median filtered adaptive coding gain. The current and the a^ -values for the previous 4 subframes are median filtered to 
get Clg^ . The OC -factor is computed by: 



ctr = i 



ag^>2 

0.5(l-0.5a^„) Q<ag^<2. (62) 

05 ag^ < 



7.40 kbit/s mode 
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The correction factor y , is computed using a mean energy value, E =30 dB. The adaptive codebook gain g „ and 
the correction factor y , are jointly vector quantized using a 7-bit codebook. The gain codebook search is performed 
by minimizing the square of the weighted error between original and reconstructed speech which is given by 



£ = 



,y-<?cZ 



= x'x + g'py'y + g^z'z-lgpx'y - Ig.x'z+lg^g.y'z (63) 

where X is the target vector, y is the filtered adaptive codebook vector, and Z is the filtered fixed codebook vector. 

6.70 kbit/s mode 

The correction factor J „^ is computed using a mean energy value, E = 28.75 dB. The adaptive codebook gain 
g „ and the correction factor J^^ are jointly vector quantized using a 7-bit codebook. The gain codebook search is 
performed by minimizing equation (63). 

5.90, 5.15 kbit/s modes 

The correction factor J „^ is computed using a mean energy value, E =33 dB. The adaptive codebook gain g ^ and 
the correction factor J^^ are jointly vector quantized using a 6-bit codebook. The gain codebook search is performed 
by minimizing equation (63). 

4.75 kbit/s mode 

The correction factors are computed using a mean energy value, E =33 dB. The adaptive codebook gains and the 
correction factors are jointly vector quantized every 10 ms. This is done by minimizing a weighted sum of the error 
criterion (63) for each of the two subframes. The default values on the weighing factors are 1. If the energy of the 
second subframe is more than two times the energy of the first subframe, the weight of the first subframe is set to 2. If 
the energy of the first subframe is more than four times the energy of the second subframe, the weight of the second 
subframe is set to 2. 

5.8.3 Update past quantized adaptive codebook gain buffer (all modes) 

After the gain quantization, the buffer with past adaptive codebook gains is updated, regardless of the value of the 
8pin-i)=gp{n-i + l), i = l,...,\ 



GpCJlag. That is. 



5.9 Memory update (all modes) 



An update of the states of the synthesis and weighting filters is needed in order to compute the target signal in the next 
subframe. 

After the two gains are quantified, the excitation signal, U\n) , in the present subframe is found by: 

u{n) = gpv{n) + g^c{n), « = 0,...,39, (64) 

where g^ and g^ are the quantified adaptive and fixed codebook gains, respectively, V\n) the adaptive codebook 

vector (interpolated past excitation), and C\n) is the fixed codebook vector (algebraic code including pitch 
sharpening). The states of the filters can be updated by filtering the signal rsSj^p (n) — u(n) (difference between 

residual and excitation) through the filters 1/ A{z) and Aiz/yij/ A{z/y2} for the 40-sample subframe and saving 
the states of the filters. This would require 3 filterings. A simpler approach which requires only one filtering is as 
follows. The local synthesized speech, s\n) , is computed by filtering the excitation signal through if A{z) ■ The 
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output of the filter due to the input reSi^p(n) - u(n) is equivalent to e{n) — s{n) - s{n) . So the states of the 
synthesis filter l/A(z) are given by e(n), n = 30,. ..,39 . Updating the states of the filter e{n) - s{n) - s{n) can 
be done by filtering the error signal e{n) through this filter to find the perceptually weighted error e^\n) . However, 
the signal e^\n) can be equivalently found by: 

e^{n)^x{n)-gpy{n)- g^z{n) , (65) 

Since the signals X\n) , y\n) , and z\n) are available, the states of the weighting filter are updated by computing 
e^\n) as in equation (65) for n = 30,. . .,39 . This saves two filterings. 

4.75 kbit/s mode 

The memory update in the first and third subframes use the unquantized gains in equation (64). After the second and 
fourth subframes respectively, when the gains are quantized, the state is recalculated using the quantized gains. 



6 Functional description of the decoder 

The function of the decoder consists of decoding the transmitted parameters (LP parameters, adaptive codebook vector, 
adaptive codebook gain, fixed codebook vector, fixed codebook gain) and performing synthesis to obtain the 
reconstructed speech. The reconstructed speech is then post-filtered and upscaled. The signal flow at the decoder is 
shown in figure 4. 

6.1 Decoding and speech synthesis 

The decoding process is performed in the following order: 

Decoding of LP filter parameters: The received indices of LSP quantization are used to reconstruct the quantified 
LSP vectors. The interpolation described in clause 5.2.6 is performed to obtain 4 interpolated LSP vectors 
(corresponding to 4 subframes). For each subframe, the interpolated LSP vector is converted to LP filter coefficient 
domain aj^ , which is used for synthesizing the reconstructed speech in the subframe. 

The following steps are repeated for each subframe: 

1) Decoding of the adaptive codebook vector: The received pitch index (adaptive codebook index) is used to find 
the integer and fractional parts of the pitch lag. The adaptive codebook vector V\n) is found by interpolating the 
past excitation u[n) (at the pitch delay) using the FIR filter described in clause 5.6. 

2) Decoding of the innovative codebook vector: The received algebraic codebook index is used to extract the 
positions and amplitudes (signs) of the excitation pulses and to find the algebraic codevector C\n) . If the integer 
part of the pitch lag, T, is less than the subframe size 40, the pitch sharpening procedure is applied which 
translates into modifying C\n) by c{n) = c[n) + /3c[n — T) , where |3 is the decoded pitch gain, g„, 
bounded by [0.0,L0] or [0.0,0.8], depending on mode. 

3) Decoding of the adaptive and fixed codebook gains: In case of scalar quantization of the gains (12.2 kbit/s and 
7.95 kbit/s modes) the received indices are used to readily find the quantified adaptive codebook gain, g „ , and 

the quantified fixed codebook gain correction factor, Ygf;, from the corresponding quantization tables. In case of 
vector quantization of the gains (all other modes), the received index gives both the quantified adaptive 
codebook gain, g „ , and the quantified fixed codebook gain correction factor, y„^. . The estimated fixed 

codebook gain g'^. is found as described in clause 5.7. First, the predicted energy is found by: 
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E{n) = YbiR{n-i) 

1=1 



(66) 



and then the mean innovation energy is found by: 

£;=101og 

The predicted gain g^ is found by: 



r , N-l ^ 



(67) 



^^ = 10 

The quantified fixed codebook gain is given by: 



0.05 £■(«)+£-£■; 



(68) 



/gc6c ■ 



(69) 



4) Smoothing of the fixed codebook gain (10.2, 6.70, 5.90, 5.15, 4.75 kbit/s modes): An adaptive smoothing of 
the fixed codebook gain is performed to avoid unnatural fluctuations in the energy contour. The smoothing is 
based on a measure of the stationarity of the short-term spectrum in the q domain. The smoothing strength is 
computed from this measure. An averaged q-value is computed for each frame n by: 



q(n) = 0.84 ■ q(n -1) + 0.16 -44 (n). 



(70) 



For each subframe m, a difference measure between the averaged vector and the quantized and interpolated vector is 
computed by: 



Aj) 



j m 



M-qiiHn) 



7(1) 



in) 



(71) 



wherey runs over the 10 LSPs. Furthermore, a smoothing factor, fc^ , is computed by: 

km = mm[K2,max(0,diff^ - ^i))/^2 ' 



(72) 



where the constants are set to K^ — 0.4 and K2 — 0.25 . A hangover period of 40 subframes is used where the k^ 
value is set 1.0 if the diff^ has been above 0.65 for 10 consecutive frames. A value of 1.0 corresponds to no 
smoothing. An averaged fixed codebook gain value is computed for each subframe by: 



1 ^ 
8{m) = -J]gcim-i). 

The fixed codebook gain used for synthesis is now replaced by a smoothed value given by: 

8c = 8c-kni + Ec-(^-km)- 



(73) 



(74) 
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5) Anti-sparseness processing (7.95, 6.70, 5.90, 5.15, 4.75 kbit/s modes): An adaptive anti-sparseness post- 
processing procedure is applied to the fixed codebook vector c{n) in order to reduce perceptual artefacts arising 
from the sparseness of the algebraic fixed codebook vectors with only a few non-zero samples per subframe. The 
anti-sparseness processing consists of circular convolution of the fixed codebook vector with an impulse 
response. Three pre-stored impulse responses are used and a number impNr = 0,1,2 is set to select one of 
them. A value of 2 corresponds to no modification, a value of 1 corresponds to medium modification, while a 
value of corresponds to strong modification. The selection of the impulse response is performed adaptively 
from the adaptive and fixed codebook gains. The following procedure is employed: 

ifgp < 0.6 then 

impNr = 0; 
else if ip < 0.9 then 

impNr = 1; 
else 

impNr = 2; 

Detect onset by comparing the fixed codebook gain to the previous fixed codebook gain. If the current value is more 
than twice the previous value an onset is detected. 

If not onset and impNr = , the median filtered value of the current and the previous 4 adaptive codebook gains are 
computed. If this value is less than 0.6, impNr = . 

If not onset, the impNr -value is restricted to increase by one step from the previous subframe. 

If an onset is declared, the impNr -value is increased by one if it is less than 2. 

6) Computing the reconstructed speech: The excitation at the input of the synthesis filter is given by: 

u{n) = gpv{n) + g^c{n) . (75) 

Before the speech synthesis, a post-processing of excitation elements is performed. This means that the total excitation 
is modified by emphasizing the contribution of the adaptive codebook vector: 



<(n): 



u(n) + 0.25/3gpV(n), g^ > 0.5, 12.2 kbit/s mode 

u(n) + 0.5 I3g pV(n), g > 0.5, all other modes 

u(n) gp ^0.5 



(76) 



Adaptive gain control (AGC) is used to compensate for the gain difference between the non-emphasized excitation 
u[n) and emphasized excitation u[n) The gain scaling factor ;; for the emphasized excitation is computed by: 



77 = 



1.0, gp<05. 



(77) 



The gain-scaled emphasized excitation signal {i'{n) is given by: 



u'{n) = u{n)j] . 



(78) 
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The reconstructed speech for the subframe of size 40 is given by: 

10 
s{n) = u'{n)-2_i^is{n-i), n = 0,...,39 . (79) 

i=l 

where aj are the interpolated LP filter coefficients. 

7) Additional instability protection: An additional instability protection is implemented in the speech decoder 
which is monitoring overflows in the synthesis filter. If an overflow has occurred in the synthesis part, the whole 

v(n),n = -(143 + ll),...,39| 
adaptive codebook memory, ^^^^^^^^^^^^^^^^^^J is scaled down by a factor of 4, and the synthesis 

filtering is repeated using this down-scaled memory. I.e. in this case step 6) is repeated, except that the post- 
processing in (76) - (78) of the excitation signal is by-passed. 

The synthesized speech s{n) is then passed through an adaptive postfilter which is described in the following clause. 

6.2 Post-processing 

6.2.1 Adaptive post-filtering (all modes) 

The adaptive postfilter is the cascade of two filters: a formant postfilter, and a tilt compensation filter. The postfilter is 
updated every subframe of 5 ms. 

The formant postfilter is given by: 

where A{z) is the received quantified (and interpolated) LP inverse filter (LP analysis is not performed at the 
decoder), and the factors '/n and y^ control the amount of the formant post-filtering. 

Finally, the filter Hf\z) compensates for the tilt in the formant postfilter HAz) and is given by: 

H^{z) = l-iuz~^ (81) 

where jU = y^ki is a tilt factor, with k^ being the first reflection coefficient calculated on the truncated ( Lp^ —11) 
impulse response, /Zr (n) , of the filter Ayzj Jf^) Ayzj J^) ■ k[ is given by: 

r (l) h-i-^ 

^( = 77^; r^ii)= llhf{j)hf{j + i). (82) 

The post-filtering process is performed as follows. First, the synthesized speech s{n) is inverse filtered through 
Aiz/Yfi ) '■° produce the residual signal r{n) . The signal r{n) is filtered by the synthesis filter V Aiz/Yj ) ■ Finally, 
the signal at the output of the synthesis filter V Aiz/Yj ) is passed to the tilt compensation filter H^ [z) resulting in 
the post-filtered speech signal S f{n) . 

Adaptive gain control (AGC) is used to compensate for the gain difference between the synthesized speech signal s{n) 
and the post-filtered signal S An) . The gain scaling factor y^^ for the present subframe is computed by: 
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Y sc 



39 

M = 



39 



(83) 



w=0 
The gain-scaled post-filtered signal ^'inj is given by: 

s\n)=l5^^{ri)sj{n) (84) 

where /5^f.{n) is updated in sample-by-sample basis and given by: 

y5,,(n) = aP,^{n - 1) + (1 - «)7.c (85) 

where OC is a AGC factor with value of 0.9. 

12.2, 10.2 kbit/s modes 

The adaptive post-filtering factors are given by: 7^ = 0.7 , y ^ =0.75 and 

JO.8, k[ > 0, 
' [ 0, otherwise. 

7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes 

The adaptive post-filtering factors are given by: y^ = 0.55 , y^ = 0.7 and y^ = 0.8 . 

6.2.2 High-pass filtering and up-scaling (all modes) 

The high-pass filter serves as a precaution against undesired low frequency components. A filter cut-off frequency of 60 
Hz is used, and the filter is given by 



0.939 819335 - 1.879638672z'' + 0.9398 19 335z'' 
1-1.933105469Z ' +0.935913085Z 



H,2 (z) = . .^...^...^ -1 . ^^..^".^o. 2 ■ (87) 



Up-scaling consists of multiplying the post-filtered speech by a factor of 2 to compensate for the down-scaling by 2 
which is applied to the input signal. 
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7 Detailed bit allocation of the adaptive multi-rate 

codec 

The detailed allocation of the bits in the adaptive multi-rate speech encoder is shown for each mode in table 9a-9h. 
These tables show the order of the bits produced by the speech encoder. Note that the most significant bit (MSB) of 
each codec parameter is always sent first. 

Table 9a: Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 244 bits/20 ms, 12.2 kbit/s mode. 



Bits (MSB-LSB) 


Description 


si -s7 


index of 1*" LSF submatrix 


s8-s15 


index of 2"" LSF submatrix 


S16-S23 


index of 3'" LSF submatrix 


s24 


sign of 3''' LSF submatrix 


s25 - s32 


index of 4'" LSF submatrix 


s33 - s38 


index of 5'" LSF submatrix 


subframe 1 


s39 - s47 


adaptive codebook index 


s48 - s51 


adaptive codebool< gain 


s52 


sign information for 1^' and 6'" pulses 


s53 - s55 


position of 1^' pulse 


s56 


sign information for 2"" and 7* pulses 


s57 - s59 


position of 2™ pulse 


s60 


sign information for 3'" and 8'" pulses 


s61 - s63 


position of 3™ pulse 


s64 


sign information for 4* and 9* pulses 


s65 - s67 


position of 4'" pulse 


s68 


sign information for 5'" and 10'" pulses 


s69 - s71 


position of 5'" pulse 


s72 - s74 


position of 6* pulse 


s75 - s77 


position of 7'" pulse 


s78 - s80 


position of 8'" pulse 


s81 - s83 


position of 9'" pulse 


s84 - s86 


position of lO"' pulse 


s87 - s91 


fixed codebook gain 


subframe 2 


s92 - s97 


adaptive codebook index (relative) 


S98-S141 


same description as s48 - s91 


subframe 3 


s142-s194 


same description as s39 - s91 


subframe 4 


s195-s244 


same description as s92 - s141 
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Table 9b: Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 204 bits/20 ms, 10.2 kbit/s mode. 



Bits (MSB-LSB) 


Description 


si -s8 


index of 1^" LSF subvector 


s9-s17 


index of 2"" LSF subvector 


S18-S26 


index of 3'" LSF subvector 


subframe 1 


s27 - s34 


adaptive codebook index 


s35 


sign information for 1^' and 5'" pulses 


s36 


sign information for 2™ and 6'" pulses 


s37 


sign information for 3''' and 7* pulses 


s38 


sign information for 4'" and 8'" pulses 


s39-s48 


position for 1"", 2™, and 5'" pulses 


s49-s58 


position for 3™, 6'", and /" pulses 


s59-s65 


position for 4* and 8*^ pulses 


s66 - s72 


codebook gains 


subframe 2 


s73 - s77 


adaptive codebook index (relative) 


S78-S115 


same description as s35 - s72 


subframe 3 


S116-S161 


same description as s27 - s72 


subframe 4 


S162-S204 


same description as s73 - s11 5 



Table 9c: Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 159 bits/20 ms, 7.95 kbit/s mode. 



Bits (MSB-LSB) 


Description 


si -s9 


index of 1^' LSF subvector 


S10-S18 


index of 2"° LSF subvector 


S19-S27 


index of 3™ LSF subvector 


subframe 1 


s28 - s35 


adaptive codebook index 


s36 - s39 


position of 4'" pulse 


s40 - s42 


position of 3''^ pulse 


s43 - s45 


position of 2™ pulse 


s46 - s48 


position of 1^' pulse 


s49 


sign information for 4'" pulse 


s50 


sign information for 3''' pulse 


s51 


sign information for 2™ pulse 


s52 


sign information for 1*" pulse 


s53 - s56 


adaptive codebook gain 


s57 - s61 


fixed codebook gain 


subframe 2 


s62 - s67 


adaptive codebook index (relative) 


s68 - s93 


same description as s36 - s61 


subframe 3 


s94-s127 1 same description as s28 - s61 


subframe 4 


S128-S159 


same description as s62 - s93 
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Table 9d: Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 148 bits/20 ms, 7.40 kbit/s mode. 



Bits (MSB-LSB) 


Description 


si -s8 


index of 1^' LSF subvector 


s9-s17 


index of 2™ LSF subvector 


S18-S26 


index of 3™ LSF subvector 


subframe 1 


s27 - s34 


adaptive codebook index 


s35 - s38 


position of 4* pulse 


s39 - s41 


position of 3'" pulse 


s42 - s44 


position of 2™ pulse 


s45 - s47 


position of 1^' pulse 


s48 


sign information for 4* pulse 


s49 


sign information for 3'" pulse 


s50 


sign information for 2ndd pulse 


s51 


sign information for 1^' pulse 


s52 - s58 


codebook gains 


subframe 2 


s59 - s63 


adaptive codebook index (relative) 


s64 - s87 


same description as s35 - s58 


subframe 3 


s88-s119 


same description as s27 - s58 


subframe 4 


s120-s148 


same description as s59 - s87 



Table 9e: Source encoder output parameters In order of occurrence and bit allocation within the 

speech frame of 134 bits/20 ms, 6.70 kbit/s mode. 



Bits (MSB-LSB) 


Description 


si -s8 


index of 1*" LSF subvector 


s9-s17 


index of 2"" LSF subvector 


S18-S26 


index of 3™ LSF subvector 


subframe 1 


s27 - s34 


adaptive codebook index 


s35 - s38 


position of 3'" pulse 


s39 - s42 


position of 2™ pulse 


s43 - s45 


position of 1^' pulse 


s46 


sign information for 3™ pulse 


s47 


sign information for 2™ pulse 


s48 


sign information for 1^' pulse 


s49 - s55 


codebook gains 


subframe 2 


s56 - s59 


adaptive codebook index (relative) 


s60 - s80 


same description as s35 - s55 


subframe 3 


s81 -si 09 


same description as s27 - s55 


subframe 4 


S110-S134 


same description as s56 - s80 
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Table 9f : Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 118 bits/20 ms, 5.90 kbit/s mode. 



Bits (MSB-LSB) 


Description 


si -s8 


index of 1*" LSF subvector 


s9-s17 


index of 2™ LSF subvector 


S18-S26 


index of 3™ LSF subvector 


subframe 1 


s27 - s34 


adaptive codebook index 


s35 - s39 


position of 2"" pulse 


s40 - s43 


position of 1^' pulse 


s44 


sign information for 2™ pulse 


s45 


sign information for 1*" pulse 


s46 - s51 


codebook gains 


subframe 2 


s52 - s55 


adaptive codebook index (relative) 


s56 - s72 


same description as s35 - s51 


subframe 3 


s73 - s97 


same description as s27 - s51 


subframe 4 


s98-s118 


same description as s52 - s72 



Table 9g: Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 103 bits/20 ms, 5.15 kbit/s mode. 



Bits (MSB-LSB) 


Description 


si -s8 


index of 1^' LSF subvector 


s9-s16 


index of 2"" LSF subvector 


S17-S23 


index of 3'" LSF subvector 


subframe 1 


s24 - s31 


adaptive codebook index 


s32 


position subset 


s33 - s35 


position of 2™ pulse 


s36 - s38 


position of 1^' pulse 


s39 


sign information for 2™ pulse 


s40 


sign information for 1*" pulse 


s41 - s46 


codebook gains 


subframe 2 


s47 - s50 


adaptive codebook index (relative) 


s51 - s65 


same description as s32 - s46 


subframe 3 


s66 - s84 


same description as s47 - s65 


subframe 4 


s85-s103 


same description as s47 - s65 
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Table 9h: Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 95 bits/20 ms, 4.75 kbit/s mode. 



Bits (MSB-LSB) 


Description 


si -s8 


index of 1^' LSF subvector 


s9-s16 


index of 2™ LSF subvector 


S17-S23 


index of 3™ LSF subvector 


subframe 1 


s24 - s31 


adaptive codebook index 


s32 


position subset 


s33 - s35 


position of 2™ pulse 


s36 - s38 


position of 1^' pulse 


s39 


sign information for 2™ pulse 


s40 


sign information for 1^' pulse 


s41 - s48 


codebook gains 


subframe 2 


s49 - s52 


adaptive codebook index (relative) 


s53 - s61 


same description as s32 - s40 


subframe 3 


s62 - s65 


same description as s49 - s52 


s66 - s82 


same description as s32- s48 


subframe 4 


s83 - s95 


same description as s49 - s61 



8 



Homing sequences 



8.1 Functional description 



The adaptive multi-rate speech codec is described in a bit-exact arithmetic to allow for easy type approval as well as 
general testing purposes of the adaptive multi-rate speech codec. 

The response of the codec to a predefined input sequence can only be foreseen if the internal state variables of the codec 
are in a predefined state at the beginning of the experiment. Therefore, the codec has to be put in a so called home state 
before a bit-exact test can be performed. This is usually done by a reset (a procedure in which the internal state 
variables of the codec are set to their defined initial values). The codec mode of the speech encoder and speech decoder 
shall be set to the tested codec mode by external means at reset. 

To allow a reset of the codec in remote locations, special homing frames have been defined for the encoder and the 
decoder, thus enabling a codec homing by inband signalling. 

The codec homing procedure is defined in such a way, that in either direction (encoder or decoder) the homing 
functions are called after processing the homing frame that is input. The output corresponding to the first homing frame 
is therefore dependent on the used codec mode and the codec state when receiving that frame and hence usually not 
known. The response of the encoder to any further homing frame is by definition the corresponding decoder homing 
frame for the used codec mode. The response of the decoder to any further homing frame is by definition the encoder 
homing frame. This procedure allows homing of both, the encoder and decoder from either side, if a loop back 
configuration is implemented, taking proper framing into account. 
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8.2 Definitions 

Encoder homing frame: The encoder homing frame consists of 160 identical samples, each 13 bits long, with the least 
significant bit set to "one" and all other bits set to "zero". When written to 16-bit words with left justification, the 
samples have a value of 0008 hex. The speech decoder has to produce this frame as a response to the second and any 
further decoder homing frame if at least two decoder homing frames were input to the decoder consecutively. The 
encoder homing frame is identical for all codec modes. 

Decoder homing frame: There exist eight different decoder homing frames, which correspond to the eight AMR codec 
modes. Using one of these codec modes, the corresponding decoder homing frame is the natural response of the speech 
encoder to the second and any further encoder homing frame if at least two encoder homing frames were input to the 
encoder consecutively. In [4], for each decoder homing frame the parameter values are given. 

8.3 Encoder homing 

Whenever the adaptive multi-rate speech encoder receives at its input an encoder homing frame exactly aligned with its 
internal speech frame segmentation, the following events take place: 

Step 1: The speech encoder performs its normal operation including VAD and SCR and produces in 

accordance with the used codec mode a speech parameter frame at its output which is in general 
unknown. But if the speech encoder was in its home state at the beginning of that frame, then the 
resulting speech parameter frame is identical to that decoder homing frame, which corresponds to 
the used codec mode (this is the way how the decoder homing frames were constructed). 

Step 2: After successful termination of that operation the speech encoder provokes the homing functions 

for all sub-modules including VAD and SCR and sets all state variables into their home state. On 
the reception of the next input frame, the speech encoder will start from its home state. 

NOTE: Applying a sequence of N encoder homing frames will cause at least N-1 decoder homing frames at the 
output of the speech encoder. 



8.4 Decoder homing 



Whenever the speech decoder receives at its input a decoder homing frame, which corresponds to the used codec mode, 
then the following events take place: 

Step 1: The speech decoder performs its normal operation and produces a speech frame at its output which 

is in general unknown. But if the speech decoder was in its home state at the beginning of that 
frame, then the resulting speech frame is replaced by the encoder homing frame. This would not 
naturally be the case but is forced by this definition here. 

Step 2: After successful termination of that operation the speech decoder provokes the homing functions 

for all sub-modules including the comfort noise generator and sets all state variables into their 
home state. On the reception of the next input frame, the speech decoder will start from its home 
state. 

NOTE 1: Applying a sequence of N decoder homing frames will cause at least N-1 encoder homing frames at the 
output of the speech decoder. 

NOTE 2: By definition (!) the first frame of each decoder test sequence must differ from the decoder homing frame 
at least in one bit position within the parameters for LPC and first subframe. Therefore, if the decoder is 
in its home state, it is sufficient to check only these parameters to detect a subsequent decoder homing 
frame. This definition is made to support a delay-optimized implementation in the TRAU uphnk 
direction. 
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Figure 2: Simplified block diagram of the CELP synthesis model 
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Figure 4: Simplified blocl< diagram of the adaptive multi-rate decoder 
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